etseidl opened a new pull request, #8190: URL: https://github.com/apache/arrow-rs/pull/8190
# Which issue does this PR close? **Note: this targets a feature branch, not main** We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. - Part of #5854. # Rationale for this change Add a custom parser for `PageLocation` as the decoding of this struct is one of several hot spots. # What changes are included in this PR? This adds a faster means of obtaining the struct field ids to `ThriftCompactInputProtocol`. For a small struct (3 fields) with all of them required, we can save a good bit of time bypassing `ThriftCompactInputProtocol::read_field_begin` which is very general and can handle out-of-order fields, among other things. By adding a new function `read_field_header`, we can avoid the costly branching that occurs when calculating the new field id (as well as special handling needed for boolean fields). Field validation is then handled on the consuming side while decoding the `PageLocation` struct. Note that to obtain the speed up seen, we need to assume the fields will always be in order, and the field ids will all be encoded as field deltas. This is probably a fairly safe assumption, but there does exist the possibility of custom thrift writers that use absolute field ids. If we encounter such a writer in the wild, this change will need to be reverted. # Are these changes tested? These changes should be covered by existing changes. # Are there any user-facing changes? None beyond the changes in this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org