[GitHub] [incubator-iceberg] anjalinorwood opened a new pull request #435: Support for all primitive data types: required and optional

GitBox Fri, 30 Aug 2019 16:25:11 -0700

anjalinorwood opened a new pull request #435: Support for all primitive data 
types: required and optional
URL: https://github.com/apache/incubator-iceberg/pull/435
 
 
   Support for all primitive data types: required and optional
       
       This commit provides optimal/near-optimal implementations for all data 
types as follows:
       + INT32, INT64, Float, Double, Date, Timestamp (non-decimal numeric data 
types):
         The implementation reads bytes from Parquet for batches of contiguous 
values as indicated
         by the definition level and writes them into the underlying data 
buffer of the ArrowVector.
         It sets validity buffers to handle optional data types.
       + Decimal data type (backed by INT32, INT64 and fixed length byte array):
         Arrow stores all decimals as 16 bytes and assumes that the decimals 
are stored in little
         endian. Vectorized decimal read implementation pads the bytes read 
from Parquet as necessary
         and stores the decimals in the expected little endian format.
       + Fixed width binary (e.g. BYTE[7]):
         Spark does not support fixed width binary data type. The data is read 
as fixed number of
         bytes from Parquet and stored as VarBinary in Arrow and exposed to 
Spark as such.
       + String data type (ENUM, JSON, UTF8, BSON) and Boolean data type:
         Value reader implementations are used to read the string and boolean 
data types.
       + UUID data type is not supported.
       
       Co-authored-by: Samarth Jain <[email protected]>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] anjalinorwood opened a new pull request #435: Support for all primitive data types: required and optional

Reply via email to