I may be wrong, but I think that Calcite's Enumerable convention requires that SQL STRUCT and ARRAY types are mapped to a Java List. The item function expects that, and the adapter should comply. If the adapter is providing an array, that is wrong.
On Sun, Sep 27, 2020 at 12:23 PM Alessandro Solimando <[email protected]> wrote: > > Hello all, > at the time I wrote the unit tests for the extended support for the missing > Cassandra data types, I have disabled the one at > CassandraAdapterDataTypesTest.java:192 > <https://github.com/apache/calcite/blob/master/cassandra/src/test/java/org/apache/calcite/test/CassandraAdapterDataTypesTest.java#L192 > > since > accessing a tuple element was returning a null value in place of the actual > non-null element. > > I recently had a chance to dig a bit to see why that's happening, and I > found what I consider the immediate cause for that (from > SqlFunctions.java:2511 > <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2511> > ): > > /** Implements the {@code [ ... ]} operator on an object whose type is not > > * known until runtime. > > */ > > public static Object item(Object object, Object index) { > > if (object instanceof Map) { > > return mapItem((Map) object, index); > > } > > if (object instanceof List && index instanceof Number) { > > return arrayItem((List) object, ((Number) index).intValue()); > > } > > return null; > > } > > > This method ends up being called, and since the backing structure for > struct is an array (see CassandraEnumerator.java:114 > <https://github.com/apache/calcite/blob/master/cassandra/src/main/java/org/apache/calcite/adapter/cassandra/CassandraEnumerator.java#L114>), > none of the two if conditions match, and null is returned. > > This of course can be easily fixed, in what follows an example: > > public static Object item(Object object, Object index) { > > if (object instanceof List && index instanceof Number) { > > return arrayItem((List) object, ((Number) index).intValue()); > > } > > if (object instanceof Object[]) { // it guarantees also that object != > > null > > if (index instanceof Number) { > > return arrayItem(Arrays.asList(object), ((Number) > > index).intValue()); > > } else if (index instanceof String) { > > Object[] array = (Object[]) object; > > return mapItem(IntStream.range(0, array.length).boxed() > > .collect(Collectors.toMap(i -> Integer.toString(i + 1), i -> > > array[i])), index); > > } > > } > > return null; > > } > > > Question is, is there anything which should be done differently on the > adapter end to prevent this from "item" to be called in the first place? Or > this is a legitimate situation and the "fix" is actually covering an > unhandled legal case? > > I have tried to look up in the codebase for something similar, but without > much luck, and I'd appreciate some guidance here. > > In what follows my analysis and a bit of context behind the status quo for > tuple/struct handling in Cassandra adapter: > > 1) I am accessing the struct elements via the "item" operator as this seems > the right way to do so according to the codebase examples I have seen > (JdbcTest for instance). Given that the "SqlTypeName" of the column of > "f_field" is "STRUCTURED", it ends up treated like a "MAP", rather than an > "ARRAY". > > "MAP" requires a "string" identifier and does not accept an integer to be > used with the "item" operator ("f_tuple"['1'] for instance). The identifier > it's the 1-based index of the element inside the structure (as dictated by > CassandraSchema.java:225 > <https://github.com/apache/calcite/blob/master/cassandra/src/main/java/org/apache/calcite/adapter/cassandra/CassandraSchema.java#L225>) > since, unlike other structs, there is no field name here coming from the > datastax driver, hence we are left with the index inside the structure, > which seems reasonable. > > At first I tried to use an integer index, that's the error returned: > > java.sql.SQLException: Error while executing SQL "select x[1], x['2'], > > x['3'] from (select "f_tuple" from "test_collections") as T(x)": From line > > 1, column 8 to line 1, column 11: Cannot apply 'ITEM' to arguments of type > > 'ITEM(<RECORDTYPE(BIGINT 1, VARBINARY 2, TIMESTAMP(0) 3)>, <INTEGER>)'. > > Supported form(s): <ARRAY>[<INTEGER>] > > <MAP>[<VALUE>] > > (omitting the full stack trace as I don't think it's adding much value). > > 2) I also had second thoughts about having mapped Cassandra tuples to > struct, but there seems to be no alternatives allowing for an indexed > collection with heterogeneous types. What's your opinion, is it correct or > there is another way I could take? > > Finally, I'd either open a JIRA ticket for adding the extra behavior on > "item", or one for fixing the Cassandra adapter for tuples if in the end > that's the root cause of the issue. > > <https://github.com/asolimando/calcite/actions/runs/274415659> > In the first case I would already have a branch > <https://github.com/asolimando/calcite/tree/struct-values-index-access> which > might be ready to become a PR, for which tests are passing > <https://github.com/asolimando/calcite/actions/runs/274415659> already (CI > on forks are really great, thanks for introducing that!). > > Looking forward to hearing your thoughts. > > Best regards, > Alessandro
