icexelloss commented on code in PR #14385:
URL: https://github.com/apache/arrow/pull/14385#discussion_r994952758
##########
cpp/src/arrow/engine/substrait/serde_test.cc:
##########
@@ -3187,5 +3198,164 @@ TEST(Substrait, IsthmusPlan) {
*compute::default_exec_context(), buf, {},
conversion_options);
}
+TEST(Substrait, PlanWithExtension) {
+ // This demos an extension relation
+ std::string substrait_json = R"({
+ "extensionUris": [],
+ "extensions": [],
+ "relations": [{
+ "root": {
+ "input": {
+ "extension_multi": {
+ "common": {
+ "emit": {
+ "outputMapping": [0, 1, 2, 3]
+ }
+ },
+ "inputs": [
+ {
+ "read": {
+ "common": {
+ "direct": {
+ }
+ },
+ "baseSchema": {
+ "names": ["time", "key", "value1"],
+ "struct": {
+ "types": [
+ {
+ "i32": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ },
+ {
+ "i32": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ },
+ {
+ "fp64": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ }
+ ],
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_REQUIRED"
+ }
+ },
+ "namedTable": {
+ "names": ["T1"]
+ }
+ }
+ },
+ {
+ "read": {
+ "common": {
+ "direct": {
+ }
+ },
+ "baseSchema": {
+ "names": ["time", "key", "value2"],
+ "struct": {
+ "types": [
+ {
+ "i32": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ },
+ {
+ "i32": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ },
+ {
+ "fp64": {
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_NULLABLE"
+ }
+ }
+ ],
+ "typeVariationReference": 0,
+ "nullability": "NULLABILITY_REQUIRED"
+ }
+ },
+ "namedTable": {
+ "names": ["T2"]
+ }
+ }
+ }
+ ],
+ "detail": {
+ "@type": "/arrow.substrait.AsOfJoinRel",
+ "on": {
+ "selection": {
+ "directReference": {
+ "structField": {
+ "field": 0,
+ }
+ },
+ "rootReference": {}
+ }
+ },
+ "by": [
Review Comment:
I see - this makes sense now.
Unfortunately the limitation is too strict I think (requiring all input
tables to have the same column index for the by key)
The restriction for the on-key is OK - we have restrictions for "time" to be
the first column. However, we don't have restriction on the "by-key" column -
it can have different column index for different input tables.
I think we need to fix this problem in order to Asof Join to be useful -
Since we are deciding the shape of the substrait message now - I think it makes
sense to fix both message format and asof join options format in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]