amousavigourabi commented on PR #1141: URL: https://github.com/apache/parquet-mr/pull/1141#issuecomment-1732618636
@wgtmac thanks a lot for the review! It's quite a big one so I really appreciate you took the time for it. To address your concerns: > * It seems that we still have some compatibility issues. Can you confirm? If yes, could you please write them out explicitly? Correct, japicmp comes up with two incompatibilities. These are in the classes CodecFactory and ParquetReader. The incompatibilities it points to are both changes of private and protected field types from Configuration to ParquetConfiguration. These changes are strictly necessary for the effort to unhadoop the read/write API. > * Is there any follow-up work item to do? Would be good if we can know the whole picture in advance. After this, the following steps that will need to be taken are: 1. the creation of a simple unhadooped implementation of the ParquetConfiguration interface, and 2. adding a simple way for users to avoid the Hadoop codecs, as the OOTB implementations of everything still rely very heavily on Hadoop classes. These changes should allow users to drop the Hadoop runtime dependency. The Hadoop client API dependency will still be necessary. > * Is it possible to add a simple test case to prove that a simple writer and reader roundtrip can work successfully without Hadoop dependency? We do not yet have any serious ways for users to use the API without Hadoop dependencies. The added parameters to the TestReadWrite fixture make sure the read/write API should still function when using the ParquetConfiguration interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org