Re: [I] Allow Parquet reader to read incorrectly written (negative) uint8, uint16 values for compatibility [arrow-rs]

via GitHub Fri, 07 Feb 2025 10:29:57 -0800


parthchandra commented on issue #7040:
URL: https://github.com/apache/arrow-rs/issues/7040#issuecomment-2643692870


   @tustvold You're right that this is not a bug in Spark but users of Spark 
must have found it useful to be able to read Parquet files with unsigned int8 
and unsigned int16 which is why Spark supports reading such types. 
   https://github.com/apache/spark/pull/31921
   The actual data was written by parquet-java which was the canonical 
implementation followed by pretty much every Java based   engine I am aware of. 
I suspect there are many files out there which have such values. I would 
suggest that having the flexibility to read such values will only help adoption.
   
   > Another potential alternative might be to provide some sort of pluggable 
behavior (like allow overriding the default conversion logic via template or 
something) -- that way we downstream users who needed different edge case 
behavior could implement whatever they needed without having to add such logic 
back up here
   
   @alamb, this might be a better way than the option I suggested. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Allow Parquet reader to read incorrectly written (negative) uint8, uint16 values for compatibility [arrow-rs]

Reply via email to