anjalinorwood opened a new pull request #435: Support for all primitive data types: required and optional URL: https://github.com/apache/incubator-iceberg/pull/435 Support for all primitive data types: required and optional This commit provides optimal/near-optimal implementations for all data types as follows: + INT32, INT64, Float, Double, Date, Timestamp (non-decimal numeric data types): The implementation reads bytes from Parquet for batches of contiguous values as indicated by the definition level and writes them into the underlying data buffer of the ArrowVector. It sets validity buffers to handle optional data types. + Decimal data type (backed by INT32, INT64 and fixed length byte array): Arrow stores all decimals as 16 bytes and assumes that the decimals are stored in little endian. Vectorized decimal read implementation pads the bytes read from Parquet as necessary and stores the decimals in the expected little endian format. + Fixed width binary (e.g. BYTE[7]): Spark does not support fixed width binary data type. The data is read as fixed number of bytes from Parquet and stored as VarBinary in Arrow and exposed to Spark as such. + String data type (ENUM, JSON, UTF8, BSON) and Boolean data type: Value reader implementations are used to read the string and boolean data types. + UUID data type is not supported. Co-authored-by: Samarth Jain <[email protected]>
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
