RafaelHerrero commented on PR #21260:
URL: https://github.com/apache/datafusion/pull/21260#issuecomment-4163846798
Hi @alamb, I dug into the sqllogictest version mismatch and found a way to
update the extended tests without needing the Omega359 fork.
The problem with the fork approach:
The regenerate_sqlite_files.sh script swaps in a forked sqllogictest at
v0.27.2, but main now uses v0.29.1. The APIs changed, so the fork doesn't
compile.
What I found:
The standard --complete mode (v0.29.1) works and generates correct results,
but it has two issues with the SQLite test files:
1. It doesn't respect control resultmode valuewise — writes
space-separated rows instead of one value per line
2. Hash values get computed with the wrong sort order for newly-generated
blocks
The fix (2-pass approach):
1. Run --complete on all SQLite files to generate results
2. Post-process with a script that only touches blocks that were query
error ... Projections require unique expression names — converts results to
valuewise format with valuesort, leaving all existing blocks untouched
3. Run --complete again so hash values are recomputed with the correct
sort mode
4. Post-process again to re-apply valuewise format
Results:
- All 595 SQLite test files pass locally
- 279 files in datafusion-testing need updating (~39,706 query error
blocks that now succeed or have a different error)
- No changes needed to datafusion beyond the code fix (revert of revert)
Full disclosure: I used Claude Code to help debug this and write the
post-processing script — it took quite a bit of iteration to figure out the
valuewise format and hash issues.
Proposed next steps:
1. Open a PR to apache/datafusion-testing with the 279 updated .slt files
2. Once merged, update the submodule reference in this PR
Let me know if this approach looks good and I'll open the datafusion-testing
PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]