vibhatha commented on PR #13401: URL: https://github.com/apache/arrow/pull/13401#issuecomment-1160488517
> > Do you think it’s wise to add a CI to test Substrait related queries using this tool? > > IMO every roundtripped plan in every Substrait consumer and/or producer should also be passed through the validator. Otherwise, how would you know for sure that the Substrait plan you've successfully roundtripped through is actually sensible in any way? It does always require a complete plan, though, so you'd need some or other function for each type of thing (expression, relation, etc) that surrounds the thing with a dummy plan. Arrow could hook into it via the C interface (it's not a very pleasant interface because it's intended to be compatible with any language that can call into C, so you might want to wrap it with some C++ stuff; also it will need a Rust compiler to build) or it could just execute the CLI on a generated file (more clunky, but that can just be pulled from PyPI in binary form, so it's probably a bit easier on CI). > > I'm sure I'm biased though, since I'm the one who made the validator. It's also starting to considerably lag behind Substrait; it doesn't seem like anyone is sufficiently interested to review/collaborate, so I can't get any PRs through. > > Link, just in case: https://github.com/substrait-io/substrait-validator Intersting thoughts. I will take a look at the tool. It would be better if we can use it to validate things. But I am not sure if it needs to be inside the Arrow source or should it be a plugin for Apache Arrow. cc @westonpace > > One doubtful thing is to check in serialization is whether a projection or filter expression is added or not/ differentiation from default values. For instance filter expression defaults to a boolean literal of value true. > > Assuming you mean that in Acero the filter expression is mandatory and is just set to literal true if there is none, IMO you could just do the same thing on the Substrait side, at least for now. Likewise for the projection. Or you could just leave it for a later PR and error out when presented with nontrivial values. I don't know how hard any of these things are; I've never done anything with the Acero representation. Here it is rather, the differentiation between a user passed value vs the default. We could assume the default and do the comparison to see if an explicit value is passed. There is no API calls in Expression to check if it `has_filter` or `has_projection`. May be that kind of a function could be useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
