jp0317 commented on code in PR #36192:
URL: https://github.com/apache/arrow/pull/36192#discussion_r1241485010


##########
cpp/src/parquet/file_reader.cc:
##########
@@ -351,8 +364,11 @@ class SerializedFile : public ParquetFileReader::Contents {
     cached_source_ =
         std::make_shared<::arrow::io::internal::ReadRangeCache>(source_, ctx, 
options);
     std::vector<::arrow::io::ReadRange> ranges;
+    prebuffered_column_chunks_.clear();

Review Comment:
   Thanks for bringing this up! If they can be called concurrently, then 
wouldn't the problem already exists before this PR i.e., we already have race 
on the  `cached_source_` (`GetRowGroup` reads it while `Prebuffer` writes it)? 
   
   IIUC the `Prebuffer` is not for concurrent scenarios based on 
[here](https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L351)
 and the function 
[docs](https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.h#L166-L173)
 seems to imply a strict order between calling `Prebuffer` and creating reader 
/ reading data.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to