dmkozh edited a comment on pull request #13739:
URL: https://github.com/apache/beam/pull/13739#issuecomment-775544426
> Even with the latest changes, this is still not writing the windowing
information (including timestamps) to the cache.
That's exactly the intent of the change - we don't want to cache trivial
windowing information.
> Maybe it would be helpful to understand what the objective of this change
is?
The objective is described in the attached ticket - basically, we don't want
to cache redundant information at all, as it adds a huge overhead of ~500
bytes/record. It can be somewhat reduced, but it's still hundreds of bytes.
There may be some terminology confusion - by 'batch' pipelines I initially
meant the pipelines which don't ever care about windowing as they process all
the data at once.
If there is a better way to figure out if the pipeline doesn't care about
windowing, I could use that instead. Also, since this is an environment setting
now, it should be pretty hard to get unexpected results (though for users who
don't care about windowing there won't be an immediate benefit either...)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]