martin-traverse commented on PR #779: URL: https://github.com/apache/arrow-java/pull/779#issuecomment-2953028800
Hi @lidavidm - I have added the dictionary decoding producer, which turned out to be very simple. Now any dictionary encoded fields that are not valid Avro enums will be automatically decoded and output as their concrete type. This does require running a regex over the dictionary entries, but that only has to happen once when the producers are set up. I do think we will need to change the enum read, so dictionaries are populated during the schema phase rather than the data phase in order to read whole files with multiple blocks. I'd like to keep that change back and do it as part of the next PR, which will be read / write for whole files. Assuming you are happy with both these points then I think this PR is ready for review :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
