kbendick commented on issue #1643:
URL: https://github.com/apache/iceberg/issues/1643#issuecomment-718371436


   > We actually enable `objectReuse` by default so that chained operators can 
avoid the serialization and deserialization cost, which is huge for 
embarrassingly parallel DAGs. That is mainly for operator chaining.
   
   So I understand why we'd want to enable object reuse, but is there any 
concern that end users who use Iceberg in their job graphs might get confused 
as object reuse is not the default flink behavior?
   
   I don't think this should be a blocker anywhere, but if we merge this, we 
should definitely be sure to document it. I know at my work we don't enable 
object reuse by default because of scenarios like @stevenzwu mentioned where 
users update fields on with object references in heap-based backends (like 
assigning mutable objects directly without cloning etc).
   
   The patch is pretty large and I don't mean to be a blocker, but can we 
please be sure we document where this behavior deviates from a typical Flink 
program? I guess one could argue that since it's only in the InputFileFormat 
that it's relatively abstracted away from the users, but I think we should at 
least consider updating the docs to reflect this potentially unexpected change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to