[I] Benchmark Results: Rust SQL Parser Comparison [datafusion-sqlparser-rs]

via GitHub Wed, 11 Feb 2026 07:37:55 -0800


LucaCappelletti94 opened a new issue, #2215:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2215


   Hi! I've created an open-source benchmark comparing Rust SQL parsers for 
PostgreSQL workloads and wanted to share the results with you.
   
   ## Benchmark Results
   
   ![Benchmark 
Results](https://raw.githubusercontent.com/LucaCappelletti94/sql_ast_benchmark/main/benchmark_results.svg)
   
   ## Methodology
   
   - **Framework**: Criterion.rs v0.8 with flat sampling mode, 50 samples, 
3-second measurement time
   - **Workload**: Parsing batches of 1-1000 SQL statements concatenated into a 
single string
   - **Datasets**:
     - SELECT: 4,505 queries from Spider (Yale) + Gretel AI
     - INSERT: 992 queries from Gretel AI
     - UPDATE: 983 queries from Gretel AI
     - DELETE: 933 queries from Gretel AI
   - **Environment**: AMD Ryzen Threadripper PRO 5975WX, Ubuntu 24.04, Rust 
2021 edition
   - **Dialect**: All parsers configured for PostgreSQL
   
   ## Results for sqlparser-rs
   
   **sqlparser-rs performs excellently in this benchmark:**
   
   - **1.5-2x faster** than FFI-based parsers (pg_query.rs, pg_parse)
   - **100% compatibility** with all test queries in our corpus
   - Best balance of speed, correctness, and multi-dialect support
   
   | Statement Type | 500 statements |
   |---------------|----------------|
   | SELECT | 5.68 ms |
   | INSERT | 4.90 ms |
   | UPDATE | 3.20 ms |
   | DELETE | 2.93 ms |
   
   ### Observations
   
   1. Pure Rust implementation avoids FFI overhead, showing consistent 
performance advantage
   2. The recursive descent parser handles complex queries (CTEs, window 
functions, nested subqueries) efficiently
   3. Fuzz testing gives confidence in robustness that other parsers lack
   4. Could improve performance by using a generic S for most strings, allowing 
for both `String` and `&str` to reduce the amount of cloning which happens both 
when creating the tokens and the statements
   
   ## Full Benchmark Repository
   
   <https://github.com/LucaCappelletti94/sql_ast_benchmark>
   
   The repository includes:
   
   - Complete benchmark code
   - All SQL test datasets
   - Reproducible methodology
   - Detailed README with analysis
   
   ## Feedback Request
   
   I'd appreciate any feedback on the benchmark methodology or if there are any 
improvements I should make:
   
   1. Are there any parser configuration options that could improve performance?
   2. Are there specific query patterns I should include in the test corpus?
   3. Is there anything about the benchmark setup that might not represent 
real-world usage?
   
   Thank you for maintaining such an excellent library!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Benchmark Results: Rust SQL Parser Comparison [datafusion-sqlparser-rs]

Reply via email to