wombatu-kun commented on code in PR #16265:
URL: https://github.com/apache/iceberg/pull/16265#discussion_r3353641181


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -88,6 +88,12 @@ public PartitionData copy() {
   // cached schema
   private transient Schema avroSchema = null;
 
+  // Cached unmodifiable List<Long> view of splitOffsets, allocated lazily on
+  // the first splitOffsets() call and reused thereafter. Invalidated (set to
+  // null) whenever the underlying long[] is reassigned. Marked transient so
+  // it is not part of the wire/serialized state. See #15622.
+  private transient List<Long> splitOffsetsList = null;

Review Comment:
   This cache only saves an allocation when splitOffsets() is called more than 
once on the same BaseFile, but the manifest-encode path calls it once per file 
(V2Metadata.java:459); the only real repeat is ContentFileParser's adjacent 
double call (ContentFileParser.java:113-114). #15622 suggested routing in-core 
callers through the package-private splitOffsetArray() (raw long[]) instead, 
which avoids the allocation entirely with no extra field or invalidation sites. 
Consider deduping the ContentFileParser call, or note why the cached List is 
preferred over the long[] route for the manifest path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to