curioustien commented on PR #45351: URL: https://github.com/apache/arrow/pull/45351#issuecomment-2638723567
@wgtmac @mapleFU I'm facing an implementation blocker on keeping the same writing/reading behaviors for decimal128/256 between Arrow and Parquet because I can't find a clean way to do it. As I mentioned in this previous comment https://github.com/apache/arrow/pull/45351#discussion_r1930922813 on my original implementation, the [current Parquet reading logic](https://github.com/apache/arrow/blob/release-19.0.1-rc0/cpp/src/parquet/arrow/schema_internal.cc#L37-L40) looks at the decimal precision to determine how to convert Parquet decimal logical type to Arrow decimal type. Since we introduced decimal32/64 in arrow, I had to [change this logic to include these types based on the precision](https://github.com/curioustien/arrow/blob/parquet-decimal-test/cpp/src/parquet/arrow/schema_internal.cc#L37-L44). Therefore, whenever we want to cast a decimal32 to decimal128, we need to force the schema to convert to a bigger decimal. I found this [arrow Field.MergeWith](https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.h#L479-L491) method that could do the job (which I used it in [one of the schema tests here](https://github.com/curioustien/arrow/blob/parquet-decimal-test-feedback/cpp/src/parquet/arrow/arrow_schema_test.cc#L86-L88)). However, when I moved on to the reader/writer tests, I found that these schema fields can only be accessed from [manifest within FileReader](https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/reader.h#L308). Though, it's read only. Therefore, if I want to force schema conversion, I'll have to either: 1. Change this `manifest()` method so that we can manipulate the schema fields 2. Add a new property in [ArrowReaderProperties](https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L907) so that we can propagate this schema conversion logic I can't really come up with other options here. Both of these options require changes in some important classes, so I want to get some alignments before I proceed. Option 2 probably makes the most sense here. I also need to propagate this new property to pyarrow as well if we want to have this same behavior in python. What are your thoughts here on this problem? Any other alternatives that I should try? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
