-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20600/
-----------------------------------------------------------

Review request for drill and Jacques Nadeau.


Repository: drill-git


Description
-------

Drill-400 change parquet reader to place varbinary fields into VarCharVectors, 
allowing them to be returned by default as UTF-8 Strings. Note that this is 
done for all varbinary columns, while there is an optional metadata column in 
parquet to explicitly store that the data should be interpreted. The assumption 
for now is that all such columns will be interpreted as utf-8 strings, as this 
will be a common use case and no other distinctions exist at this time (other 
than no distinction, which could be used by some users to mean interpret as raw 
binary data, but this can be accomplished using a cast in Drill).


Diffs
-----

  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordReader.java
 6e17fba 
  
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/VarLenBinaryReader.java
 09d19a8 
  
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetResultListener.java
 73af98c 

Diff: https://reviews.apache.org/r/20600/diff/


Testing
-------

amended parquet tests so they would pass with new return type. A change in 
value vectors actually enforced a maximum record count in a vector, so a bug 
was fixed in the reader that allowed for more than 65k records to be inserted 
into a vector.


Thanks,

Jason Altekruse

Reply via email to