openinx commented on issue #1643:
URL: https://github.com/apache/iceberg/issues/1643#issuecomment-714321255


   In most flink cases,  the `FlinkInputFormat` will read a record and emit it 
to the downstream operator,  that means  it will serialize the `RowData` and 
then sends bytes to next operator,  so I think it's right to set `reuse=true` 
by default, that saves lots of object allocation from JVM.  
   
   In your batch-read case,  the ideal way should be :  allocating a fixed-size 
array which should be reused in every batch read, when reading a given record 
we should pass the relative element from reused array for reusing purpose.    
But I read the code, seems the `newAvroIterable`, `newParquetIterable`, 
`newOrcIterable` will return a Iterable whose `next()` method has no way to 
pass a `reused` instance for reusing.  we have to do the `RowData` copying 
(Copy RowData from iterator to the fixed-size array) but we still could reuse 
the fixed-size array to avoiding allocating too many young objects. 
   
   Does that make sense ? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to