Re: [PR] [lake/tiering] add total bytes read tracking and metrics for tiering service [fluss]

via GitHub Tue, 31 Mar 2026 18:39:10 -0700


beryllw commented on code in PR #2933:
URL: https://github.com/apache/fluss/pull/2933#discussion_r3019302477



##########
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/reader/BoundedSplitReader.java:
##########
@@ -153,7 +155,17 @@ public boolean hasNext() {
 
         @Override
         public ScanRecord next() {
-            return new ScanRecord(rowIterator.next());
+            InternalRow row = rowIterator.next();
+            int sizeInBytes = -1;
+            if (row instanceof ProjectedRow) {
+                InternalRow underlyingRow = ((ProjectedRow) 
row).getUnderlyingRow();
+                if (underlyingRow instanceof BinaryRow) {
+                    sizeInBytes = ((BinaryRow) underlyingRow).getSizeInBytes();
+                }
+            } else if (row instanceof BinaryRow) {
+                sizeInBytes = ((BinaryRow) row).getSizeInBytes();
+            }

Review Comment:
   Thanks for the coment.
   Other row types are possible here since BoundedSplitReader is a shared path 
— Lake snapshots produce GenericRow and Arrow logs produce ColumnarRow. For 
Tiering (the consumer of this size info), only KV-snapshot paths are involved, 
which always yield BinaryRow. The fallback to UNKNOWN_SIZE_IN_BYTES is a safe 
guard at the shared layer, and the metrics side simply skips it with a > 0 
check.



##########
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/reader/BoundedSplitReader.java:
##########
@@ -153,7 +155,17 @@ public boolean hasNext() {
 
         @Override
         public ScanRecord next() {
-            return new ScanRecord(rowIterator.next());
+            InternalRow row = rowIterator.next();
+            int sizeInBytes = -1;
+            if (row instanceof ProjectedRow) {
+                InternalRow underlyingRow = ((ProjectedRow) 
row).getUnderlyingRow();
+                if (underlyingRow instanceof BinaryRow) {
+                    sizeInBytes = ((BinaryRow) underlyingRow).getSizeInBytes();
+                }
+            } else if (row instanceof BinaryRow) {
+                sizeInBytes = ((BinaryRow) row).getSizeInBytes();
+            }

Review Comment:
   Thanks for the comment.
   Other row types are possible here since BoundedSplitReader is a shared path 
— Lake snapshots produce GenericRow and Arrow logs produce ColumnarRow. For 
Tiering (the consumer of this size info), only KV-snapshot paths are involved, 
which always yield BinaryRow. The fallback to UNKNOWN_SIZE_IN_BYTES is a safe 
guard at the shared layer, and the metrics side simply skips it with a > 0 
check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [lake/tiering] add total bytes read tracking and metrics for tiering service [fluss]

Reply via email to