westonpace commented on PR #89: URL: https://github.com/apache/arrow-testing/pull/89#issuecomment-1481178615
> @westonpace would it be useful to have more Acero/Substrait testing data like this in apache/arrow? We have a lot of hard-coded JSON but its embedded in the test files themselves (e.g. serde_test.cc or test_substrait.py) and not in standalone files. The original concern around hard-coded JSON was that Substrait may evolve quickly and those JSON files would be difficult to maintain. For example, the JSON files in this PR are missing the version field (Isthmus does not yet populate this) and they don't have URIs for the extension functions (almost no one generates these yet). So they may need to change at some point. As a result, I have been waiting for the text format to be ready before I made any attempt to curate a large set of test queries (but that is still a few months off at least). I think SQL is probably a pretty good solution if you have a good SQL->Substrait library (that may be an advantage for Java). In that case I would suggest only storing the SQL and then generating the Substrait on the fly. I don't actually know what the legal ramifications are for TPC-H but it is a good question. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
