curioustien commented on PR #45351:
URL: https://github.com/apache/arrow/pull/45351#issuecomment-2638723567

   @wgtmac @mapleFU I'm facing an implementation blocker on keeping the same 
writing/reading behaviors for decimal128/256 between Arrow and Parquet because 
I can't find a clean way to do it. As I mentioned in this previous comment 
https://github.com/apache/arrow/pull/45351#discussion_r1930922813 on my 
original implementation, the [current Parquet reading 
logic](https://github.com/apache/arrow/blob/release-19.0.1-rc0/cpp/src/parquet/arrow/schema_internal.cc#L37-L40)
 looks at the decimal precision to determine how to convert Parquet decimal 
logical type to Arrow decimal type. Since we introduced decimal32/64 in arrow, 
I had to [change this logic to include these types based on the 
precision](https://github.com/curioustien/arrow/blob/parquet-decimal-test/cpp/src/parquet/arrow/schema_internal.cc#L37-L44).
   
   Therefore, whenever we want to cast a decimal32 to decimal128, we need to 
force the schema to convert to a bigger decimal. I found this [arrow 
Field.MergeWith](https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.h#L479-L491)
 method that could do the job (which I used it in [one of the schema tests 
here](https://github.com/curioustien/arrow/blob/parquet-decimal-test-feedback/cpp/src/parquet/arrow/arrow_schema_test.cc#L86-L88)).
 However, when I moved on to the reader/writer tests, I found that these schema 
fields can only be accessed from [manifest within 
FileReader](https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/reader.h#L308).
 Though, it's read only. Therefore, if I want to force schema conversion, I'll 
have to either:
   1. Change this `manifest()` method so that we can manipulate the schema 
fields
   2. Add a new property in 
[ArrowReaderProperties](https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L907)
 so that we can propagate this schema conversion logic
   
   I can't really come up with other options here. Both of these options 
require changes in some important classes, so I want to get some alignments 
before I proceed. Option 2 probably makes the most sense here. I also need to 
propagate this new property to pyarrow as well if we want to have this same 
behavior in python.
   
   What are your thoughts here on this problem? Any other alternatives that I 
should try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to