lostluck commented on issue #21817:
URL: https://github.com/apache/beam/issues/21817#issuecomment-1227803646

   While it's not spilling to disk, I do have an approach that will decode 
elements on demand from a GBK stream, which has demonstrated heap reductions in 
the tried cases so far.
   
   The main catch is that it only works for GBKs value iterators that are read 
once. This covers most GBK usage, and covers Reshuffle, and lifted combine 
usage. It *cannot* cover general CoGBK cases due to the current re-iterator 
requirement, and it cannot cover Post-GBK PCollections that are read by more 
than one DoFn.  It would also not cover GBK re-iteration, but the Go SDK 
currently doesn't support that mode for GBKs, so that's a non issue.
   
   That specific work will be tracked in #22900.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to