amousavigourabi commented on issue #2938: URL: https://github.com/apache/parquet-java/issues/2938#issuecomment-2212508590
Hi, please be advised we do have some ParquetWriter implementations for use OOTB, such as AvroParquetWriter. AFAIK there is no implementation that is fully decoupled from using systems such as Avro. If you wish to avoid using any of these, you will unfortunately still have to create your own implementation. Leveraging our LocalOutputFile implementation avoids loading the HDFS Path class, which used to be an issue is some production environments. To be able to fully drop Hadoop runtime at the time, changes to the way writer and reader utils were configured and how data is (de)compressed were necessary, as these domains were (and still are) coupled incredibly tightly to Hadoop. We've made a good start on allowing for decoupled configurations through the ParquetConfiguration interface, though I believe the Hadoop Configuration class still needs to be loaded at one point. After the outstanding problems with the configuration and (de)compressors are resolved, usage of only the `hadoop-client-api` during build will be possible in the then supported contexts (i.e., only for (de)compressors with available alternative implementations). Until we've reached that point, we still have a *runtime* dependency on Hadoop in the project. This means you will have to maintain the dependencies on both Hadoop client API and runtime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
