GitHub user parthchandra opened a pull request:
https://github.com/apache/drill/pull/219
DRILL-3871: Off by one error while reading binary fields with one terâ¦
â¦minal null in parquet.
Changes -
1) Rewrote the NullableColumnReader.processPages function to process runs
of Null values and Non-Null values without needing to keeping track of whether
the previous iteration in the while loop had encountered a null or not. A pair
of loops now iterates over a run of nulls or a run of non-null values.
2) Removed some redundant code.
3) Renamed some variables. The indexInOutputVector is now replaced by two
local variables, readCount and writeCount only for clarity.
4) Adding tracing.
5) Added unit tests for edge cases of nulls occurring on page boundaries.
For all the unit tests, tpch-h and tpc-ds test data sets, the state of the
NullableColumnReader at the end of each iteration of processPages is identical
to the old code. In addition the boundary conditions are taken care of.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/parthchandra/incubator-drill DRILL-3871
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/219.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #219
----
commit d23ceb2a4c32da9535f1e482c4c70fcc31b8b2b8
Author: Parth Chandra <[email protected]>
Date: 2015-10-05T17:25:56Z
DRILL-3871: Off by one error while reading binary fields with one terminal
null in parquet.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---