parthchandra commented on issue #7040: URL: https://github.com/apache/arrow-rs/issues/7040#issuecomment-2643692870
@tustvold You're right that this is not a bug in Spark but users of Spark must have found it useful to be able to read Parquet files with unsigned int8 and unsigned int16 which is why Spark supports reading such types. https://github.com/apache/spark/pull/31921 The actual data was written by parquet-java which was the canonical implementation followed by pretty much every Java based engine I am aware of. I suspect there are many files out there which have such values. I would suggest that having the flexibility to read such values will only help adoption. > Another potential alternative might be to provide some sort of pluggable behavior (like allow overriding the default conversion logic via template or something) -- that way we downstream users who needed different edge case behavior could implement whatever they needed without having to add such logic back up here @alamb, this might be a better way than the option I suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
