vibhatha commented on PR #13401:
URL: https://github.com/apache/arrow/pull/13401#issuecomment-1160488517

   > > Do you think it’s wise to add a CI to test Substrait related queries 
using this tool?
   > 
   > IMO every roundtripped plan in every Substrait consumer and/or producer 
should also be passed through the validator. Otherwise, how would you know for 
sure that the Substrait plan you've successfully roundtripped through is 
actually sensible in any way? It does always require a complete plan, though, 
so you'd need some or other function for each type of thing (expression, 
relation, etc) that surrounds the thing with a dummy plan. Arrow could hook 
into it via the C interface (it's not a very pleasant interface because it's 
intended to be compatible with any language that can call into C, so you might 
want to wrap it with some C++ stuff; also it will need a Rust compiler to 
build) or it could just execute the CLI on a generated file (more clunky, but 
that can just be pulled from PyPI in binary form, so it's probably a bit easier 
on CI).
   > 
   > I'm sure I'm biased though, since I'm the one who made the validator. It's 
also starting to considerably lag behind Substrait; it doesn't seem like anyone 
is sufficiently interested to review/collaborate, so I can't get any PRs 
through.
   > 
   > Link, just in case: https://github.com/substrait-io/substrait-validator
   
   Intersting thoughts. I will take a look at the tool. It would be better if 
we can use it to validate things. But I am not sure if it needs to be inside 
the Arrow source or should it be a plugin for Apache Arrow. cc @westonpace 
   
   > > One doubtful thing is to check in serialization is whether a projection 
or filter expression is added or not/ differentiation from default values. For 
instance filter expression defaults to a boolean literal of value true.
   > 
   > Assuming you mean that in Acero the filter expression is mandatory and is 
just set to literal true if there is none, IMO you could just do the same thing 
on the Substrait side, at least for now. Likewise for the projection. Or you 
could just leave it for a later PR and error out when presented with nontrivial 
values. I don't know how hard any of these things are; I've never done anything 
with the Acero representation.
   
   Here it is rather, the differentiation between a user passed value vs the 
default. We could assume the default and do the comparison to see if an 
explicit value is passed. There is no API calls in Expression to check if it 
`has_filter` or `has_projection`. May be that kind of a function could be 
useful. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to