ggangadharan commented on PR #5590: URL: https://github.com/apache/hive/pull/5590#issuecomment-2564150395
Hi @okumin , Thank you for taking the time to review the pull request. In the Iceberg Parquet table, the timestamp column is read as **LOCALDATETIME**. I’ve attached a screenshot for reference. <img width="1649" alt="Screenshot 2024-12-28 at 8 18 14 AM" src="https://github.com/user-attachments/assets/270e0ac3-3f39-43af-8f86-33e40d3ae2b3" /> There is a notable difference in how the timestamp column is stored at the Parquet file format level. Specifically: - In Iceberg Parquet tables, the timestamp column is stored as **INT64 L:TIMESTAMP(MICROS,false)** . - In standard Parquet tables, the timestamp column is stored as **INT96** . For clarity, I’ve also included the metadata from Parquet-tools for reference. **As Iceberg Parquet table** ``` file schema: table ------------------------------------------------------------------------------------------------------------------------------------------------------------ id: OPTIONAL INT32 R:0 D:1 name: OPTIONAL BINARY L:STRING R:0 D:1 dt: OPTIONAL INT64 L:TIMESTAMP(MICROS,false) R:0 D:1 row group 1: RC:1 TS:112 OFFSET:4 ------------------------------------------------------------------------------------------------------------------------------------------------------------ id: INT32 SNAPPY DO:0 FPO:4 SZ:35/33/0.94 VC:1 ENC:BIT_PACKED,RLE,PLAIN ST:[min: 1, max: 1, num_nulls: 0] name: BINARY SNAPPY DO:0 FPO:39 SZ:44/42/0.95 VC:1 ENC:BIT_PACKED,RLE,PLAIN ST:[min: test name, max: test name, num_nulls: 0] dt: INT64 SNAPPY DO:0 FPO:83 SZ:39/37/0.95 VC:1 ENC:BIT_PACKED,RLE,PLAIN ST:[min: 2024-08-09T14:08:26.326107, max: 2024-08-09T14:08:26.326107, num_nulls: 0] ``` **As standard parquet table** ``` file schema: hive_schema ------------------------------------------------------------------------------------------------------------------------------------------------------------ id: OPTIONAL INT32 R:0 D:1 name: OPTIONAL BINARY L:STRING R:0 D:1 dt: OPTIONAL INT96 R:0 D:1 row group 1: RC:1 TS:137 OFFSET:4 ------------------------------------------------------------------------------------------------------------------------------------------------------------ id: INT32 UNCOMPRESSED DO:0 FPO:4 SZ:33/33/1.00 VC:1 ENC:BIT_PACKED,RLE,PLAIN ST:[min: 1, max: 1, num_nulls: 0] name: BINARY UNCOMPRESSED DO:0 FPO:37 SZ:42/42/1.00 VC:1 ENC:BIT_PACKED,RLE,PLAIN ST:[min: test name, max: test name, num_nulls: 0] dt: INT96 UNCOMPRESSED DO:79 FPO:110 SZ:62/62/1.00 VC:1 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ST:[min: 0x78037C8D4C2E0000748B2500, max: 0x78037C8D4C2E0000748B2500, num_nulls: 0] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
