amousavigourabi commented on PR #1141:
URL: https://github.com/apache/parquet-mr/pull/1141#issuecomment-1732618636

   @wgtmac thanks a lot for the review! It's quite a big one so I really 
appreciate you took the time for it.
   
   To address your concerns:
   
   > * It seems that we still have some compatibility issues. Can you confirm? 
If yes, could you please write them out explicitly?
   
   Correct, japicmp comes up with two incompatibilities. These are in the 
classes CodecFactory and ParquetReader. The incompatibilities it points to are 
both changes of private and protected field types from Configuration to 
ParquetConfiguration. These changes are strictly necessary for the effort to 
unhadoop the read/write API.
   
   > * Is there any follow-up work item to do? Would be good if we can know the 
whole picture in advance.
   
   After this, the following steps that will need to be taken are: 1. the 
creation of a simple unhadooped implementation of the ParquetConfiguration 
interface, and 2. adding a simple way for users to avoid the Hadoop codecs, as 
the OOTB implementations of everything still rely very heavily on Hadoop 
classes. These changes should allow users to drop the Hadoop runtime 
dependency. The Hadoop client API dependency will still be necessary.
   
   > * Is it possible to add a simple test case to prove that a simple writer 
and reader roundtrip can work successfully without Hadoop dependency?
   
   We do not yet have any serious ways for users to use the API without Hadoop 
dependencies. The added parameters to the TestReadWrite fixture make sure the 
read/write API should still function when using the ParquetConfiguration 
interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to