gowa opened a new pull request, #3121:
URL: https://github.com/apache/parquet-java/pull/3121

   Note: it is not a ready to merge pull request, but a request to check if the 
concept of using code generation for solving some performance issues, 
associated with the usage of protobuf reflection when writing or reading 
parquet files, is of potential interest of repository owners. I decided to 
verify the concept at a rather early stage due to a significant effort required 
to implement the change. Should the approach and a new optional dependency on 
ByteBuddy is found satisfactorily and potentially acceptable to be included 
into parquet-java, I will attempt to properly finish first the 'write' part and 
then the 'read' part (in terms of code quality and tests). Therefore, any feed 
is appreciated.
   
   ### Rationale for this change
   We read and write a lot of parquet data, defined by protobuf schemas from 
Java. It is seen that this can be done faster than what is offered out of the 
box now.
   The change introduced improves proto-to-parquet file writing performance by 
means of code generation (in my synthetic tests by around 50% with SNAPPY 
compression, especially, when structures have a lot of primitive type fields).
   
   ### What changes are included in this PR?
   1. an extension point in MessageWriter that redirects writing to a generated 
on-the-fly class dealing with protobuf generated classes getters directly, not 
via Protobuf Java Reflection methods.
   2. a separate class where all code generation logic is located.
   
   ### Are these changes tested?
   current unit tests work fine.
   
   ### Are there any user-facing changes?
   a configuration to disable code generation logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to