[I] [EPIC] Improve sqlparser performance [datafusion-sqlparser-rs]

via GitHub Tue, 26 Nov 2024 06:07:23 -0800


alamb opened a new issue, #1557:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1557


   Normally, in a SQL processing system, parsing SQL is not a major bottleneck 
compared to actually processing data. That being said, given how many SQL 
strings are parsed by this crate, I think there is significant benefit to 
improving the performance of the SQL parser in this crate.
   
   That being said, I also think it is important to minimize the impact on 
downstream crates as much as possible. 
   
   Recently, we started [introducing locations into the 
parser](https://github.com/apache/datafusion-sqlparser-rs/pull/1435) (thanks 
again @Nyrox!), which we found slows things down a bit (see 
https://github.com/apache/datafusion-sqlparser-rs/pull/1435#issuecomment-2500664144).
   
   Thankfully, I think there is significant room for improvement. As as part of 
the adding location information, I spent some time profiling and I think there 
are some obvious ways to improve the performance without impacting downstream 
crates.
   
   Here is the flamegraph for anyone who is interested (you can download it 
locally to get zoom / etc):
   
   
[sqlparser-bench-flamegraph](https://github.com/user-attachments/assets/20ad1ac5-6bc2-4674-80df-9f563f09a536)
   
   
![sqlparser-bench-flamegraph](https://github.com/user-attachments/assets/20ad1ac5-6bc2-4674-80df-9f563f09a536)
   
   
   ## Ideas to improve performance:
   - The most obvious one is to next_token / peek to not clone each `Token` 
(which involves copying strings). 
   - https://github.com/apache/datafusion-sqlparser-rs/issues/1381
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [EPIC] Improve sqlparser performance [datafusion-sqlparser-rs]

Reply via email to