RafaelHerrero commented on PR #21578:
URL: https://github.com/apache/datafusion/pull/21578#issuecomment-4266399159

   Thanks for the feedback @alamb @Omega359!
   
   I diffed the fork (73c47cf7) against upstream v0.29.1 to understand the full 
delta. Here's what I found:
   
   Parsing is identical. The onlyif/skipif condition parsing uses the same 
2-token pattern (["onlyif", label]) in both versions, and lines are tokenized 
the same way (split_whitespace). A line like onlyif mysql # use DIV operator 
for integer division produces 9 tokens, doesn't match the 2-token pattern, and 
hits InvalidLine in both versions. The 428 parse errors come from the sd regex 
cleanup in the script not catching all onlyif mysql blocks — the 
fixed-line-count patterns (3-12 lines + blank line) miss blocks with different 
sizes or edge cases. This is a script bug, not a parser compatibility issue.
   
   The fork is actually behind upstream — it's missing Record::Let, 
ExpectedError::SqlState, slt:ignore marker, and Partitioner that were added to 
upstream after the fork was created.
   
   The fork's real addition is in runner.rs: a dual-runner update_test_file 
that takes an optional second runner (Postgres). During completion, it runs 
each query on both DataFusion and Postgres, auto-updates expected results when 
they match, and adds diagnostic comments about mismatches (type differences, 
result differences, errors). It also creates .bak backup files. The upstream 
update_test_file only supports a single runner.
   
   I plan to fix the sd cleanup by replacing the fragile regex patterns with a 
proper block-removal approach. But before removing the fork entirely:
   
   @Omega359 — is the dual-runner completion (DataFusion + Postgres comparison) 
essential to the regeneration workflow, or would single-runner Postgres 
completion produce equivalent results? I want to make sure we're not losing 
something important.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to