jvanstraten commented on PR #13401:
URL: https://github.com/apache/arrow/pull/13401#issuecomment-1160287804

   > Do you think it’s wise to add a CI to test Substrait related queries using 
this tool?
   
   IMO every roundtripped plan in every Substrait consumer and/or producer 
should also be passed through the validator. Otherwise, how would you know for 
sure that the Substrait plan you've successfully roundtripped through is 
actually sensible in any way? It does always require a complete plan, though, 
so you'd need some or other function for each type of thing (expression, 
relation, etc) that surrounds the thing with a dummy plan. Arrow could hook 
into it via the C interface (it's not a very pleasant interface because it's 
intended to be compatible with any language that can call into C, so you might 
want to wrap it with some C++ stuff; also it will need a Rust compiler to 
build) or it could just execute the CLI on a generated file (more clunky, but 
that can just be pulled from PyPI in binary form, so it's probably a bit easier 
on CI).
   
   I'm sure I'm biased though, since I'm the one who made the validator. It's 
also starting to considerably lag behind Substrait; it doesn't seem like anyone 
is sufficiently interested to review/collaborate, so I can't get any PRs 
through.
   
   Link, just in case: https://github.com/substrait-io/substrait-validator
   
   > One doubtful thing is to check in serialization is whether a projection or 
filter expression is added or not/ differentiation from default values. For 
instance filter expression defaults to a boolean literal of value true.
   
   Assuming you mean that in Acero the filter expression is mandatory and is 
just set to literal true if there is none, IMO you could just do the same thing 
on the Substrait side, at least for now. Likewise for the projection. Or you 
could just leave it for a later PR and error out when presented with nontrivial 
values. I don't know how hard any of these things are; I've never done anything 
with the Acero representation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to