jorgecarleitao commented on issue #1666: URL: https://github.com/apache/arrow-rs/issues/1666#issuecomment-1120159324
Great write-up, @tustvold ! I agree with your assessment of least surprise on option 3. The way I think about option 1 and 3 is: Option 1 is potentially lossy on data (some bytes may be are lost) and lossless on metadata (metadata is preserved); option 3 is lossless on data and potentially lossy on metadata. In my experience metadata is easier to recover/preserve through other means (e.g. catalog, parquet's custom metadata, even column naming conventions) than data, that is usually unrecoverable. From this perspective, option 1 has a real impact on data integrity. fwiw we use option 3 on arrow2. E.g. `Date64` is encoded as Parquet's `Int64` with no annotated converted nor logical type (and the arrow schema in the metadata for arrow-aware readers). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
