Daniel Barclay (Drill) created DRILL-3955:
---------------------------------------------
Summary: Possible bug in creation of Drill columns for HBase
column families
Key: DRILL-3955
URL: https://issues.apache.org/jira/browse/DRILL-3955
Project: Apache Drill
Issue Type: Bug
Reporter: Daniel Barclay (Drill)
If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns
in a given HBase column family, {{HBaseRecordReader}} doesn't create a Drill
column for that HBase column family.
Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill
column exists for that HBase column family, that {{setupNewSchema}} creates a
dummy Drill column using the usual {{NullableIntVector}} type. In particular,
it is not a map vector as {{HBaseRecordReader}} creates when it sees an HBase
column family.
Should {{HBaseRecordReader}} and/or something around setting up for reading
HBase (including setting up that {{ProjectRecordBatch}}) make sure that all
HBase column families are represented with map vectors so that
{{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?
The problem is that, currently, when an HBase table is read in two separate
fragments, one fragment (seeing rows with columns in the column family) can get
a map vector for the column family while the other (seeing only rows with no
columns in the column familar) can get the {{NullableIntVector}}. Downstream
code that receives the two batches ends up with an unresolved conflict,
yielding IndexOutOfBoundsExceptions as in DRILL-3954.
It's not clear whether there is only one bug--that downstream code doesn't
resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)--or two--that the
HBase reading code should set up a Drill column for every HBase column family
(regardless of whether it has any columns in the rows that were read) and that
downstream code doesn't resolve {{NullableIntValue}} dummy fields (resolution
is applicable to sources other than just HBase).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)