RafaelHerrero commented on PR #21578: URL: https://github.com/apache/datafusion/pull/21578#issuecomment-4266399159
Thanks for the feedback @alamb @Omega359! I diffed the fork (73c47cf7) against upstream v0.29.1 to understand the full delta. Here's what I found: Parsing is identical. The onlyif/skipif condition parsing uses the same 2-token pattern (["onlyif", label]) in both versions, and lines are tokenized the same way (split_whitespace). A line like onlyif mysql # use DIV operator for integer division produces 9 tokens, doesn't match the 2-token pattern, and hits InvalidLine in both versions. The 428 parse errors come from the sd regex cleanup in the script not catching all onlyif mysql blocks — the fixed-line-count patterns (3-12 lines + blank line) miss blocks with different sizes or edge cases. This is a script bug, not a parser compatibility issue. The fork is actually behind upstream — it's missing Record::Let, ExpectedError::SqlState, slt:ignore marker, and Partitioner that were added to upstream after the fork was created. The fork's real addition is in runner.rs: a dual-runner update_test_file that takes an optional second runner (Postgres). During completion, it runs each query on both DataFusion and Postgres, auto-updates expected results when they match, and adds diagnostic comments about mismatches (type differences, result differences, errors). It also creates .bak backup files. The upstream update_test_file only supports a single runner. I plan to fix the sd cleanup by replacing the fragile regex patterns with a proper block-removal approach. But before removing the fork entirely: @Omega359 — is the dual-runner completion (DataFusion + Postgres comparison) essential to the regeneration workflow, or would single-runner Postgres completion produce equivalent results? I want to make sure we're not losing something important. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
