I've adapted Calcite's EnumerableCalc code generation to generate the BeamCalc DoFn. The primary purpose behind this change is so we can take advantage of Calcite's extensive SQL operator implementation. This deletes ~11000 lines of code from Beam (with ~350 added), significantly increases the set of supported SQL operators, and improves performance and correctness of currently supported operators. Here is my work in progress: https://github.com/apache/beam/pull/6417
There are a few bugs in Calcite that this has exposed: Fixed in Calcite master: - CALCITE-2321 <https://issues.apache.org/jira/browse/CALCITE-2321> - The type of a union of CHAR columns of different lengths should be VARCHAR - CALCITE-2447 <https://issues.apache.org/jira/browse/CALCITE-2447> - Some POWER, ATAN2 functions fail with NoSuchMethodException Pending PRs: - CALCITE-2529 <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j should promote integer to floating point when generating function calls - CALCITE-2530 <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM function does not throw exception when the length of trim character is not 1(one) More work: - CALCITE-2404 <https://issues.apache.org/jira/browse/CALCITE-2404> - Accessing structured-types is not implemented by the runtime - (none yet) - Support multi character TRIM extension in Calcite I would like to push these changes in with these minor regressions. Do any of these Calcite bugs block this functionality being adding to Beam? Andrew