davisp commented on PR #1561: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1561#issuecomment-2533229913
https://github.com/apache/datafusion-sqlparser-rs/compare/main...davisp:datafusion-sqlparser-rs:pd/experiment/less-token-cloning tl;dr - Roughly 28% speedup on the microbenchmarks compared to main. 18% of that by tweaking a few of the main token methods in the last commit. I ended up poking at this today and have managed to get about a 28% improvement on the benchmarks over `main`. Roughly 10% of that is checking for hot paths in the flame graphs and slowly converting those functions over to avoid cloning tokens. On a whim, I also poked at optimizing the Token::make_word method that does a binary search over all keywords. That appears to have saved about 400ms (of 1.2s total originally attributed to it). I skimmed the docs for [phf](https://crates.io/crates/phf) but I'm not certain what this project's appetite is for new dependencies. It currently seems quite lean so I didn't pursue that just yet. Lastly, the biggest single commit was the latest that updates a few of the token handling methods themselves to avoid clones. This was the biggest jump at 18% improvement for a total just around 28% (based on my very non-scientific measurements). Also, one thing I did different than you @alamb, I noted CI flagged your EOF_TOKEN, so I just made mine a static since it doesn't have any initialization. All in all, this approach certainly seems feasible as well as actually productive. It turns out the hardest part is accounting for error reporting with the `expected("message", token)` API. I ended up adding an `expected_current` which doesn't take a token parameter and uses the current index for reporting. That works for now, but going forward I think we're gonna want to figure out a much different error reporting API as its quite fragile (you have to pay attention to which token is being reported and whether or not the token index has been modified). There are also a number of places where the token being reported in errors is incorrect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
