----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20600/ -----------------------------------------------------------
Review request for drill and Jacques Nadeau. Repository: drill-git Description ------- Drill-400 change parquet reader to place varbinary fields into VarCharVectors, allowing them to be returned by default as UTF-8 Strings. Note that this is done for all varbinary columns, while there is an optional metadata column in parquet to explicitly store that the data should be interpreted. The assumption for now is that all such columns will be interpreted as utf-8 strings, as this will be a common use case and no other distinctions exist at this time (other than no distinction, which could be used by some users to mean interpret as raw binary data, but this can be accomplished using a cast in Drill). Diffs ----- exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordReader.java 6e17fba exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/VarLenBinaryReader.java 09d19a8 exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetResultListener.java 73af98c Diff: https://reviews.apache.org/r/20600/diff/ Testing ------- amended parquet tests so they would pass with new return type. A change in value vectors actually enforced a maximum record count in a vector, so a bug was fixed in the reader that allowed for more than 65k records to be inserted into a vector. Thanks, Jason Altekruse
