adamreeve commented on issue #43057: URL: https://github.com/apache/arrow/issues/43057#issuecomment-2499123593
> It may not be obvious how to use it with other Parquet C++ APIs, but the OffsetIndex conceptually allows direct access to individual pages. Ah OK interesting, thanks. If we support this in future by updating PageReader then maybe we'd need to document that PageReader isn't thread-safe, and that users need to create separate PageReader instances per thread? Otherwise I guess we could allow users to create per-thread decryptors and pass them to the PageReader methods. Another option would be using locking within the decryptors, and maybe the overhead of that would be OK if the locks would not usually be contended. > But GetColumnDataDecryptor will happily reuse the footer_data_decryptor_ for all columns that use footer key encryption... I tested doing a Dataset scan using uniform encryption and this does cause decryptor errors. I thought it's worth creating a separate issue for that as it's not exactly the same problem as this issue, so I've reported this as #44852. It's probably worth pointing out that specifying the same master key for columns as the footer master key isn't the same as uniform encryption. With uniform encryption the same data encryption key is used, but if you specify the same key name for columns then separate data encryption keys will be generated per column. And the uniform encryption option isn't exposed in PyArrow, so this scenario might not be that common. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
