Hor911 commented on code in PR #34461:
URL: https://github.com/apache/arrow/pull/34461#discussion_r1127016579


##########
cpp/src/parquet/arrow/reader.h:
##########
@@ -249,6 +249,13 @@ class PARQUET_EXPORT FileReader {
 
   virtual ::arrow::Status ReadRowGroup(int i, std::shared_ptr<::arrow::Table>* 
out) = 0;
 
+  virtual ::arrow::Status WillNeedRowGroups(const std::vector<int>& row_groups,
+                                            const std::vector<int>& 
column_indices) = 0;

Review Comment:
   It can't be expressed in this API. This method is translated into call of 
arrow::io::RandomAccessFile::WillNeed()
   
   No-op is default and valid implementation of WillNeed. It means that no 
preload/prefetch is provided in this RAF implementation. All work will be done 
when ReadAt or ReadAsync is called.
   
   Current Arrow API expect tight coupling between FileReader, 
ParquetFileReader and intermediate Cache. It is not possible to provide true 
async decoupling w/o significant API changes (it was discussed somewhere).
   
   For my technique to work, one should provide special implementation of 
arrow::io::RandomAccessFile which will receive WillNeed, download the data and 
signals it in some "hidden" way. Not perfect, but possible to reach what I 
needed w/o API changes and any other side effects.
   
    I think I'll be able to provide you link ro real use case tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to