tudorm commented on pull request #11598: URL: https://github.com/apache/beam/pull/11598#issuecomment-623554386
In general, the Dataflow UI ought to use the exact counter GBK has for effecting the sanity checking, if it doesn't do that already. However, I believe the estimate in this case is for the PCollection output one step down from the read from shuffle, at which point I don't know how useful any estimation is at all when reading from a GBK (except for accounting the amount of data read and possibly reduced / transformed). On Mon, May 4, 2020 at 8:38 AM Lukasz Cwik <[email protected]> wrote: > How will this impact PCollection size estimation shown in the Dataflow UI? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/apache/beam/pull/11598#issuecomment-623537884>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABRMIED6PZHHSF32BM24HSLRP3OKTANCNFSM4MYNL2QA> > . > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
