Blizzara commented on PR #13888:
URL: https://github.com/apache/datafusion/pull/13888#issuecomment-2560214683
I like the idea, more testing the better! We already have some Substrait TCP
testing, but I think that's from "known Substrait" -> DF, so it only tests the
consumer, while this would test the producer as well.
I think roughly there should be three levels of "equality":
1. a plan goes through roundtrip without crashing, but doesn't match
equality (this doesn't necessarily mean the result is correct, but it shows
that there aren't any totally unsupported LogicalPlan nodes)
2. "optimized" plan equality - calling optimize on both plans before
comparing, like you do.
3. "unoptimized" plan equality - comparing the plans directly.
I feel like (2) should be "easier" check than (3), though the tests you have
here show that some plans would pass unoptimized but not optimized, which
confuses me.
Overall, I think (3) is nice to have, but not something I'd spent too much
effort for - it doesn't really bring that much benefits to users compared to
(2). (2) is nice compared to (1) in that it guarantees a bit more correctness,
but in many cases it's unfortunately tough to support even (2) (even in theory,
roundtrip isn't lossless, since multiple DF plans may result in the same
Substrait plan, and the other way around too).
So I'd suggest splitting the TPCH tests into 4 categories:
- those that pass (3)
- those that pass (2)
- those that pass (1)
- maybe eyeballed into a) those that look correct after roundtrip even if
the plan doesn't fully match, and b) those that look incorrect, if any
- those that fail to convert for the roundtrip
That allows merging the tests, preventing regressions, and tracking needed
improvements :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]