alamb commented on PR #7687:
URL: https://github.com/apache/arrow-rs/pull/7687#issuecomment-2987549771

   Thank you @rahulketch 
   
   Here is a related issue PR in Spark to stop writing INT96 timestamps
   - https://github.com/apache/spark/pull/50215
   
   I am kind of confused about the current status of Int96 -- the parquet spec 
says they are deprecated but spark keeps writing them and this PR (and others) 
seem to imply Spark / Databricks plans to keep writing INT96 timestamps 
indefinitely. 
   
   Here is a related mailing list discussion on this topic: 
https://lists.apache.org/thread/6fm50b3pmh6mz659jb5wx5vzmvwccz1n
   
   As @emkornfield pointed out on that discussion, the spec explicitly says the 
sort order for INT96 types is undefined:
   
   
https://github.com/apache/parquet-format/blob/87f2c8bf77eefb4c43d0ebaeea1778bd28ac3609/src/main/thrift/parquet.thrift#L1079
   
   Perhaps we should also update the spec to reflect whatever is desired as 
part of change the parquet writers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to