[
https://issues.apache.org/jira/browse/DRILL-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinfeng Ni resolved DRILL-5464.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.12.0
Fixed as a part of patch for DRILL-5546, commit id:
fde0a1df1734e0742b49aabdd28b02202ee2b044
> Fix JSON reader when it deals with empty file
> ---------------------------------------------
>
> Key: DRILL-5464
> URL: https://issues.apache.org/jira/browse/DRILL-5464
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Fix For: 1.12.0
>
>
> An empty json file is the one without any json object. If we query an empty
> json file asking it to return column 'A', Drill's JSON record reader would
> return a batch with 0 row, and put column 'A' as a nullable int column. A
> better name for such column might be phantom columns, as the record reader
> does not have any knowledge of the column schema, and the nullable int column
> is just a guessed schema.
> However, that processing could introduce many issues. Consider if we have a
> directory consisting of multiple json files and at least one of them is
> empty. If column 'A' is returned as nullable-int column from the reader over
> the empty file, while the other json files contains a real typed column 'A',
> that would cause query hit many issues, including 1) SchemaChangeException,
> 2) failed in certain operator which does not detect SchemaChange, 3) or
> incorrect query result, since the run-time code is generated over a phantom
> column type, not a real type.
> For instance, the following query against yelp json file run successfully.
> {code}
> select count(*), stars from
> dfs.`/tmp/yelp/yelp_academic_dataset_review.json` group by stars;
> {code}
> If an empty json file is added to the directory, the query would fail with
> the following error (which falls into the 2nd category : PartitionSender did
> not detect schema change properly).
> {code}
> select count(*), stars from dfs.`/tmp/yelp` group by stars;
> Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.
> Expected vector class of org.apache.drill.exec.vector.NullableIntVector but
> was holding vector class org.apache.drill.exec.vector.NullableBigIntVector,
> field= stars(BIGINT:OPTIONAL)[$bits$(UINT1:REQUIRED), stars(BIGINT:OPTIONAL)]
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)