JingsongLi commented on pull request #17520:
URL: https://github.com/apache/flink/pull/17520#issuecomment-972468104


   Hi all, I'll give my understanding. (Correct me if I am wrong)
   
   ## Object ArrayList vs Lazy deserialization
   
   As long as the objects inside the `ArrayList` do not fall into the GC old 
area, the performance difference is not significant.
   
   If we use `ArrayList`. There is a trade-off:
   - Larger capacity: With the complexity of downstream processing, it may 
cause elements to fall into the GC full zone.
   - Smaller capacity: The extreme case is 1, which is too costly for 
`BlockArrayQueue` and seriously affects throughput.
   
   Since this trade-off is more difficult to control, we try not to apply a 
collection of objects. If we must bundle data, we apply a structure similar to 
BytesMap (only binary, no objects).
   
   ## Lazy deserialization in StreamFormat
   
   The key problem is that `StreamFormat` has no way to know the real 
demarcation point of the implementation, which may cause the implementation to 
hit an EOF exception.
   Is it possible for StreamFormat to expose a block-like interface that allows 
implementations to define the demarcation of a block, or each compressed block 
defines the demarcation point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to