adamreeve commented on issue #43057:
URL: https://github.com/apache/arrow/issues/43057#issuecomment-2499123593

   > It may not be obvious how to use it with other Parquet C++ APIs, but the 
OffsetIndex conceptually allows direct access to individual pages.
   
   Ah OK interesting, thanks. If we support this in future by updating 
PageReader then maybe we'd need to document that PageReader isn't thread-safe, 
and that users need to create separate PageReader instances per thread? 
Otherwise I guess we could allow users to create per-thread decryptors and pass 
them to the PageReader methods. Another option would be using locking within 
the decryptors, and maybe the overhead of that would be OK if the locks would 
not usually be contended.
   
   > But GetColumnDataDecryptor will happily reuse the footer_data_decryptor_ 
for all columns that use footer key encryption...
   
   I tested doing a Dataset scan using uniform encryption and this does cause 
decryptor errors. I thought it's worth creating a separate issue for that as 
it's not exactly the same problem as this issue, so I've reported this as 
#44852.
   
   It's probably worth pointing out that specifying the same master key for 
columns as the footer master key isn't the same as uniform encryption. With 
uniform encryption the same data encryption key is used, but if you specify the 
same key name for columns then separate data encryption keys will be generated 
per column.  And the uniform encryption option isn't exposed in PyArrow, so 
this scenario might not be that common.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to