alamb commented on PR #7687: URL: https://github.com/apache/arrow-rs/pull/7687#issuecomment-2987549771
Thank you @rahulketch Here is a related issue PR in Spark to stop writing INT96 timestamps - https://github.com/apache/spark/pull/50215 I am kind of confused about the current status of Int96 -- the parquet spec says they are deprecated but spark keeps writing them and this PR (and others) seem to imply Spark / Databricks plans to keep writing INT96 timestamps indefinitely. Here is a related mailing list discussion on this topic: https://lists.apache.org/thread/6fm50b3pmh6mz659jb5wx5vzmvwccz1n As @emkornfield pointed out on that discussion, the spec explicitly says the sort order for INT96 types is undefined: https://github.com/apache/parquet-format/blob/87f2c8bf77eefb4c43d0ebaeea1778bd28ac3609/src/main/thrift/parquet.thrift#L1079 Perhaps we should also update the spec to reflect whatever is desired as part of change the parquet writers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
