[GitHub] [iceberg] RussellSpitzer commented on issue #3921: A BaseDataReader error always occurs when traversing a partitioned table

GitBox Wed, 19 Jan 2022 08:00:46 -0800


RussellSpitzer commented on issue #3921:
URL: https://github.com/apache/iceberg/issues/3921#issuecomment-1016613183



   Is the purpose here to accumulate all the records into an on Executor linked 
queue? I'm a little nervous around the direct manipulation of the iterators 
here as well as the building of Executor specific memory constructs. 
   
   I think if I was debugging this the first thing I would try is just doing a 
full collect of the dataframe. Make sure the normal pathway works fine.
   
   ```scala
   ds.collect // or if this is to large, ds.take(10) 
   ```
   
    Then after that I would probably try an implementation where we don't 
manually touch the iterators and doesn't use a shared memory construct, 
something like :
   
   ```scala
   ds.foreachPartition{ p: Iterator[java.lang.Long] => 
     p.foreach( i => print(i)) 
   }
   ```
   
   Then I would add back in the memory construct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on issue #3921: A BaseDataReader error always occurs when traversing a partitioned table

Reply via email to