guan404ming opened a new pull request, #61111: URL: https://github.com/apache/airflow/pull/61111
<!-- Thank you for contributing! Please provide above a brief description of the changes made in this pull request. Write a good git commit message following this guide: http://chris.beams.io/posts/git-commit/ Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping (in general) for the review if you do not see reaction for a few days (72 Hours is the minimum reaction time you can expect from volunteers) - we sometimes miss notifications. In case of an existing issue, reference it using one of the following: * closes: #ISSUE * related: #ISSUE --> ## Why - node-sql-parser contributes a large bundle footprint (~88 MB package) for a single use case: detecting whether a string is valid SQL - The Airflow UI only needs SQL validation, not full AST parsing or query manipulation, node-sql-parser is overkill - `sqlparser-ts` (backed by Rust's `sqlparser-rs` from apache datafusion compiled to WASM) is 13.9x smaller in package size, 2x faster at parsing, and supports 14 SQL dialects - Reduces the overall frontend bundle size, improving load times for Airflow UI users ## How - Replaced node-sql-parser, a library I author and maintain, which compiles Rust's sqlparser-rs to WebAssembly for high-performance SQL parsing in JS/TS environments with zero js/ts deps required - npm https://www.npmjs.com/package/@guanmingchiu/sqlparser-ts - src code https://github.com/guan404ming/sqlparser-ts - Refactored detectLanguage.ts to use sqlparser-ts's validate() function instead of node-sql-parser's parse() for SQL detection - Added comprehensive unit tests for detectLanguage covering JSON, SQL, Bash, YAML, and plain text detection ## Benchmark | Metric | Before | After | Change | |--------|--------|-------|--------| | Package size | 87 MB | 6.4 MB | **-93%** | | JS bundle | 7,846 KB | 5,248 KB | **-33%** | | Parse speed | 7.06 μs/call | 4.3 μs/call | **2.06x faster** | | Build time | ~44s | ~27s | **-39%** | | Compatibility | fail in complex SQL in SQLite | fully support and keep update | - | Detailed benchmark between `sqlparser-ts` and `node-sql-parser`: https://github.com/guan404ming/sqlparser-ts/tree/main/benchmark **before** <img width="2944" height="1544" alt="image" src="https://github.com/user-attachments/assets/93888ee3-d940-4b68-992a-018d7324b849" /> **after** <img width="2944" height="1544" alt="image" src="https://github.com/user-attachments/assets/ae3ddaf3-e29c-417a-9e76-3d044dcbd6cf" /> --- ##### Was generative AI tooling used to co-author this PR? <!-- If generative AI tooling has been used in the process of authoring this PR, please change below checkbox to `[X]` followed by the name of the tool, uncomment the "Generated-by". --> - [ ] Yes (please specify the tool below) <!-- Generated-by: [Tool Name] following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) --> --- * Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. Note: commit author/co-author name and email in commits become permanently public when merged. * For fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. * When adding dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). * For significant user-facing changes create newsfragment: `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
