lukecwik edited a comment on pull request #12603:
URL: https://github.com/apache/beam/pull/12603#issuecomment-704435983


   > I see, so it is the full switch from Read.Bounded/Unbounded to SDF by 
default. Can you get this one green so we can test it and then merge it, I 
would like to see if there is some perf impact, and probably that we document 
how to get the previous `Unbounded` translation in case any existing users find 
any difference.
   > 
   > If I understood correctly you might intend to tackle watermark holds in 
the 'future'? Just for learning curiosity I assume this will be done in 
`SparkProcessKeyedElements` for Gbk/Stateful translation, might this need some 
extra changes? asking just because I am reading the translation of Portable 
Streaming runner and I see watermarks are taken into account from Impulse so I 
was wondering if something was missing here or if this is done in a different 
place maybe in core.
   
   I'll try to see what I can get working with the GlobalWatermarkHolder 
implementation that exists. I think we should be able to use arbitrary ids in 
it it just might be really slow since the readers/writers should really care 
about their upstream watermarks (main and side input) so having a global 
broadcast seems less then desirable.
   
   For now lets break up this change into multiple PRs (Spark already supports 
bounded SDFs via the SplittableParDoNaiveBounded.OverrideFactory):
   1) Enable impulse (https://github.com/apache/beam/pull/13018)
   2) Swap bounded reads to use SDFs (you can test bounded read as an SDF 
impact)
   3) Add support for watermark holds
   4) Enable streaming SDFs using the SparkProcessKeyedElements implementation 
that is part of this PR
   5) Swap unbounded reads to use SDFs (you can test unbounded read as an SDF 
impact)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to