Yicong-Huang commented on code in PR #5149:
URL: https://github.com/apache/texera/pull/5149#discussion_r3292312007


##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/result/iceberg/IcebergDocument.scala:
##########
@@ -125,7 +125,9 @@ private[storage] class IcebergDocument[T >: Null <: AnyRef](
       .getOrElse(
         return 0
       )
-    table.newScan().planFiles().iterator().asScala.map(f => 
f.file().recordCount()).sum
+    Using.resource(table.newScan().planFiles()) { tasks =>

Review Comment:
   sorry to be the blocker here, but materializing an iterator != consuming an 
iterator. 
   
   for instance, `sum(range(1000000))` (consume) is cheap because it only keeps 
one int. but `list(range(1000000))` (materialize) would leave 1000000 ints in 
mem. 
   
   so I still suggest we rethink about materializing the iterator. 



##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/result/iceberg/IcebergDocument.scala:
##########
@@ -125,7 +125,9 @@ private[storage] class IcebergDocument[T >: Null <: AnyRef](
       .getOrElse(
         return 0
       )
-    table.newScan().planFiles().iterator().asScala.map(f => 
f.file().recordCount()).sum
+    Using.resource(table.newScan().planFiles()) { tasks =>

Review Comment:
   sorry to be the blocker here, but materializing an iterator != consuming an 
iterator. 
   
   for instance, `sum(range(1000000))` (consume) is cheap because it only keeps 
one int. but `list(range(1000000))` (materialize) would leave 1000000 ints in 
memory. 
   
   so I still suggest we rethink about materializing the iterator. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to