alexeykudinkin commented on code in PR #6311:
URL: https://github.com/apache/hudi/pull/6311#discussion_r939458460


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -134,14 +134,14 @@ class ColumnStats {
 
     HashMap<String, ColumnStats> allColumnStats = new HashMap<>();
 
-    // Collect stats for all columns by iterating through records while 
accounting
-    // corresponding stats
-    records.forEach((record) -> {
-      // For each column (field) we have to index update corresponding column 
stats
-      // with the values from this record
-      targetFields.forEach(field -> {
-        ColumnStats colStats = allColumnStats.computeIfAbsent(field.name(), 
(ignored) -> new ColumnStats());
-
+    // For each column (field) we have to index update corresponding column 
stats
+    // with the values from this record
+    targetFields.forEach(field -> {

Review Comment:
   I took another look actually and this change doesn't make sense to me: 
previously we'd iterate over list of records **once**, iterating over fields 
for every record providing for spatial locality (every record will be accessed 
just once, all of the fields will be handled w/in the inner loop) which 
provides for greater CPU caching opportunity.
   
   Now with your change we will make N iterations over all of the records 
(where N is the number of columns) providing for no spatial locality (we will 
first access field A from every record, then field B, then field C, etc)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to