Paul Rogers created DRILL-6048:
----------------------------------
Summary: ListVector is incomplete and broken, RepeatedListVector
works
Key: DRILL-6048
URL: https://issues.apache.org/jira/browse/DRILL-6048
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Drill provides two kinds of "list vectors": {{ListVector}} and
{{RepeatedListVector}}. I attempted to use the {{ListVector}} to implement
lists in JSON. While some parts work, others are broken and JIRA tickets were
filed.
Once things worked well enough to run a query, it turned out that the Project
operator failed. Digging into the cause, it appears that the {{ListVector}} is
incomplete and not used. Its implementation of {{makeTransferPair()}} was
clearly never tested. A list has contents, but when this method attempts to
create the contents of the target vector, it fails to create the list contents.
Elsewhere, we saw that the constructor did correctly create the vector, and
that the {{promoteToUnion()}} had holes. The sheer number of bugs leads to the
conclusion that this class is not, in fact, used or usable.
Looking more carefully at the JSON and older writer code, it appears that the
ListVector was *not* used for JSON, and that JSON has the limitations of a
repeated vector (it cannot support lists with null elements.)
This implies that the JSON reader itself is broken as it does not support fully
JSON semantics because it does not use the {{ListVector}} that was intended for
this purpose.
So, the conclusion is that JSON uses:
* Repeated vectors for single-dimensional arrays (without null support)
* {{RepeatedListVector}} for two-dimensional arrays
This triggers the question: what do we do for three-dimensional arrays?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)