etseidl opened a new pull request, #8190:
URL: https://github.com/apache/arrow-rs/pull/8190

   # Which issue does this PR close?
   **Note: this targets a feature branch, not main**
   
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax.
   
   - Part of #5854.
   
   # Rationale for this change
   
   Add a custom parser for `PageLocation` as the decoding of this struct is one 
of several hot spots.
   
   # What changes are included in this PR?
   
   This adds a faster means of obtaining the struct field ids to 
`ThriftCompactInputProtocol`. For a small struct (3 fields) with all of them 
required, we can save a good bit of time bypassing 
`ThriftCompactInputProtocol::read_field_begin` which is very general and can 
handle out-of-order fields, among other things. By adding a new function 
`read_field_header`, we can avoid the costly branching that occurs when 
calculating the new field id (as well as special handling needed for boolean 
fields). Field validation is then handled on the consuming side while decoding 
the `PageLocation` struct.
   
   Note that to obtain the speed up seen, we need to assume the fields will 
always be in order, and the field ids will all be encoded as field deltas. This 
is probably a fairly safe assumption, but there does exist the possibility of 
custom thrift writers that use absolute field ids. If we encounter such a 
writer in the wild, this change will need to be reverted.
   
   # Are these changes tested?
   
   These changes should be covered by existing changes.
   
   # Are there any user-facing changes?
   
   None beyond the changes in this branch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to