[CassandraAdapter] selecting tuple elements always returns "null"

Alessandro Solimando Sun, 27 Sep 2020 12:23:57 -0700

Hello all,
at the time I wrote the unit tests for the extended support for the missing
Cassandra data types, I have disabled the one at
CassandraAdapterDataTypesTest.java:192
<https://github.com/apache/calcite/blob/master/cassandra/src/test/java/org/apache/calcite/test/CassandraAdapterDataTypesTest.java#L192
> since
accessing a tuple element was returning a null value in place of the actual
non-null element.


I recently had a chance to dig a bit to see why that's happening, and I
found what I consider the immediate cause for that (from
SqlFunctions.java:2511
<https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2511>
):

  /** Implements the {@code [ ... ]} operator on an object whose type is not
>    * known until runtime.
>    */

public static Object item(Object object, Object index) {
>     if (object instanceof Map) {
>       return mapItem((Map) object, index);
>     }
>     if (object instanceof List && index instanceof Number) {
>       return arrayItem((List) object, ((Number) index).intValue());
>     }
>     return null;
>   }


This method ends up being called, and since the backing structure for
struct is an array (see CassandraEnumerator.java:114
<https://github.com/apache/calcite/blob/master/cassandra/src/main/java/org/apache/calcite/adapter/cassandra/CassandraEnumerator.java#L114>),
none of the two if conditions match, and null is returned.

This of course can be easily fixed, in what follows an example:

public static Object item(Object object, Object index) {
>   if (object instanceof List && index instanceof Number) {
>      return arrayItem((List) object, ((Number) index).intValue());
>    }
>    if (object instanceof Object[]) { // it guarantees also that object !=
> null
>      if (index instanceof Number) {
>        return arrayItem(Arrays.asList(object), ((Number)
> index).intValue());
>      } else if (index instanceof String) {
>        Object[] array = (Object[]) object;
>        return mapItem(IntStream.range(0, array.length).boxed()
>           .collect(Collectors.toMap(i -> Integer.toString(i + 1), i ->
> array[i])), index);
>      }
>    }
>    return null;
>   }


Question is, is there anything which should be done differently on the
adapter end to prevent this from "item" to be called in the first place? Or
this is a legitimate situation and the "fix" is actually covering an
unhandled legal case?

I have tried to look up in the codebase for something similar, but without
much luck, and I'd appreciate some guidance here.

In what follows my analysis and a bit of context behind the status quo for
tuple/struct handling in Cassandra adapter:

1) I am accessing the struct elements via the "item" operator as this seems
the right way to do so according to the codebase examples I have seen
(JdbcTest for instance). Given that the "SqlTypeName" of the column of
"f_field" is "STRUCTURED", it ends up treated like a "MAP", rather than an
"ARRAY".

"MAP" requires a "string" identifier and does not accept an integer to be
used with the "item" operator ("f_tuple"['1'] for instance). The identifier
it's the 1-based index of the element inside the structure (as dictated by
CassandraSchema.java:225
<https://github.com/apache/calcite/blob/master/cassandra/src/main/java/org/apache/calcite/adapter/cassandra/CassandraSchema.java#L225>)
since, unlike other structs, there is no field name here coming from the
datastax driver, hence we are left with the index inside the structure,
which seems reasonable.

At first I tried to use an integer index, that's the error returned:

java.sql.SQLException: Error while executing SQL "select x[1], x['2'],
> x['3'] from (select "f_tuple" from "test_collections") as T(x)": From line
> 1, column 8 to line 1, column 11: Cannot apply 'ITEM' to arguments of type
> 'ITEM(<RECORDTYPE(BIGINT 1, VARBINARY 2, TIMESTAMP(0) 3)>, <INTEGER>)'.
> Supported form(s): <ARRAY>[<INTEGER>]
> <MAP>[<VALUE>]

(omitting the full stack trace as I don't think it's adding much value).

2) I also had second thoughts about having mapped Cassandra tuples to
struct, but there seems to be no alternatives allowing for an indexed
collection with heterogeneous types. What's your opinion, is it correct or
there is another way I could take?

Finally, I'd either open a JIRA ticket for adding the extra behavior on
"item", or one for fixing the Cassandra adapter for tuples if in the end
that's the root cause of the issue.

<https://github.com/asolimando/calcite/actions/runs/274415659>
In the first case I would already have a branch
<https://github.com/asolimando/calcite/tree/struct-values-index-access> which
might be ready to become a PR, for which tests are passing
<https://github.com/asolimando/calcite/actions/runs/274415659> already (CI
on forks are really great, thanks for introducing that!).

Looking forward to hearing your thoughts.

Best regards,
Alessandro

[CassandraAdapter] selecting tuple elements always returns "null"

Reply via email to