mapleFU commented on code in PR #46463:
URL: https://github.com/apache/arrow/pull/46463#discussion_r2106987170


##########
cpp/src/parquet/metadata.cc:
##########
@@ -334,8 +335,19 @@ class ColumnChunkMetaData::ColumnChunkMetaDataImpl {
     return possible_geo_stats_ != nullptr && possible_geo_stats_->is_valid();
   }
 
+  inline std::shared_ptr<EncodedStatistics> encoded_statistics() const {
+    return is_stats_set() ? possible_encoded_stats_ : nullptr;
+  }
+
   inline std::shared_ptr<Statistics> statistics() const {
-    return is_stats_set() ? possible_stats_ : nullptr;
+    if (is_stats_set()) {
+      // Because we are modifying possible_stats_ in a const method
+      const std::lock_guard<std::mutex> guard(stats_mutex_);
+      if (possible_stats_ == nullptr) {
+        possible_stats_ = MakeColumnStats(*column_metadata_, descr_);
+      }
+    }
+    return possible_stats_;

Review Comment:
   I think this is a bit dangerous ( user might merely touch this), assume 
thread1, thread2 calls `statistics`
   
   1. thread1 calls `is_stats_set`, creates the `possible_encoded_stats_`, 
return
   2. thread2 calls `is_stats_set`, return `possible_stats_` ( which is empty )
   3. thread1 construct the `possible_stats_`
   
   Seems 2/3 has concurrency issues?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to