jorgecarleitao commented on issue #982: URL: https://github.com/apache/arrow-rs/issues/982#issuecomment-980079021
I was trying to communicate that what Spark does does not solve the problem in general, just the particular date 9999; it essentially shifts the problem by a factor of 1000x. I would argue that 200 years for the world to migrate away from int96 should be enough, but probably someone has said the same about the imperial system 200 years ago and here we are. ^_^ IMO panicking is not a valid behavior because it is susceptible to DOS (e.g. an application accepting parquet files from the internet will now panic and unwind on every request). I think that there are 3 options within the current arrow specification: 1. do what spark does and truncate nanoseconds 2. do what pyarrow does (wrapped value) 3. offer the [saturated](https://doc.rust-lang.org/std/primitive.u32.html#method.saturating_mul) value (i.e. the maximum an i64 handles). ## 1. * we lose the nano precision * order is preserved * may result in loss of data integrity * different from the C++ implementation * equal to the spark (and likely other) implementations * backward incompatible * may result in loss of data integrity ## 2. * we keep the nano precision * order is not preserved * may result in loss of data integrity * equal to the C++ implementation * different from the spark (and likely other) implementations * backward compatible * may result in loss of data integrity ## 3. * we keep the nano precision * order is preserved * may result in loss of data integrity * precision-equal but behavior-different to the C++ implementation * different from the spark (and likely other) implementations * backward compatible I am tempted to argue for 3 because it preserves the two important semantic properties (order and nano precision), but I would be fine with any of them, tbh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
