xuzifu666 opened a new pull request, #7919:
URL: https://github.com/apache/paimon/pull/7919

   ### Purpose
   
   Paimon currently does not support rtree indexes. Refer to this paper 
https://postgis.net/docs/support/rtree.pdf for implementation instructions on 
how to implement this index.
   The following are the relevant benchmark test results:
   
   ### Hardware Configuration
   
   - **CPU**: MacBook Pro (M-series processor)
   - **Memory**: 16GB LPDDR5
   - **Operating System**: macOS 14.x
   
   ### Software Configuration
   
   - **Java Version**: OpenJDK 11+
   - **Build**: Maven 3.8.x
   - 
   ### Test Parameters
   
   - **Warmup Iterations**: 3
   - **Benchmark Iterations**: 10
   - **Query Count**: 1000-10000 queries
   - **Random Seed**: 42 (for reproducibility)
   
   **Query Performance (10,000 queries)**
   ```
   R-Tree:       0.47 µs per query
   Linear Scan:  464.41 µs per query
   Speedup:      985.58×
   Average results per query: 20 records
   ```
   
   **Analysis by Dataset Size**
   <!DOCTYPE html>
   Dataset Size | R-Tree (µs) | Linear Scan (µs) | Speedup | Query Selectivity
   -- | -- | -- | -- | --
   1K | 0.20 | 14.90 | 75× | 2%
   10K | 0.12 | 50.24 | 403× | 2%
   100K | 0.35 | 492.44 | 1407× | 2%
   1M | 0.39 | 495.25 | 1279× | 2%
   
   <!DOCTYPE html>
   Query Type | Area Size | R-Tree (µs) | Linear Scan (µs) | Speedup | 
Selectivity
   -- | -- | -- | -- | -- | --
   Small | 500×500 | 0.22 | 366.27 | 1684× | 0.02%
   Medium | 1500×1500 | 0.21 | 400.52 | 1899× | 0.02%
   Large | 5000×5000 | 0.28 | 556.48 | 1997× | 0.03%
   
   **Point Query vs Range Query**
   Search Performance on 100K Dataset:
   ```
   Point queries (1000):      303.76 µs/query (with warmup optimization)
   Range queries (100):       357.04 µs/query
   Linear scan (100 scans):   65170.62 µs/scan
   ```
   
   Improvement vs Linear Scan:
   Point query:   214× speedup
   Range query:   182× speedup
   
   Sequential Data Access Pattern
   
   ```
   1M grid data (1000×1000 points)
   
   Average query time: 1.54 µs
   Results returned: 30 records
   
   Performance Characteristics:
   - First query: 8.38 µs (cache warmup)
   - Subsequent queries: 0.67-0.88 µs (steady state)
   ```
   
   ### Tests
   # Run comparison benchmark
   java -cp paimon-common/target/test-classes:paimon-common/target/classes \
     org.apache.paimon.fileindex.rtree.RTreeVsLinearScanBenchmark
   
   # Run detailed benchmark
   java -cp paimon-common/target/test-classes:paimon-common/target/classes \
     org.apache.paimon.fileindex.rtree.RTreeBenchmark
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to