I think the description of when a side input is ready vs expired is the trouble here.
- You know that W is expired only when you can be sure that no main input element could reference it. - You know that W is ready *even if it got no data* if the input that would end up in W would be dropped (aka when W expires according to the *side input* watermark) So for your scenario, you push back the elements, that holds W from being collected, when W expires on the side input you make it ready, you process the elements with empty contents on the side input, then you GC the side input. Kenn On Thu, Mar 8, 2018 at 4:32 PM Shen Li <[email protected]> wrote: > Hi Lukasz, > > Let's explain this problem using a specific example. > > Say I have a main input element X, which accesses side input window W. > When X arrives at a ParDo operator, W is not ready and not expired either. > So, in this case, the ParDo should push back X and wait for W to become > ready. Say, after two minutes, W is still unready but is expired due to > advanced main input watermark. In this situation, how does Beam expect > runners/engines to handle the pushed back value X? Discard X or throw an > error? > > Thanks, > Shen > > On Thu, Mar 8, 2018 at 6:35 PM, Lukasz Cwik <[email protected]> wrote: > >> I believe your missing over this point: "and also to not expire the side >> input till the main input watermark advances beyond the garbage collection >> hold of the side input." >> >> On Thu, Mar 8, 2018 at 3:33 PM, Shen Li <[email protected]> wrote: >> >>> Hi Lukasz, >>> >>> Thanks again. >>> >>> > the runner is required to hold back the main input till the side >>> input is ready >>> >>> Yes, I understand these requirements. But what if the side input expires >>> before it becomes ready? >>> >>> Shen >>> >>> >> >
