Fokko commented on pull request #14139: URL: https://github.com/apache/beam/pull/14139#issuecomment-796657180
@reuvenlax thanks for looking at this. Allow me to elaborate a bit. I'm running into limits of the size of the graph, and noticed that the writing to bigquery consists of many steps. My aim was to reduce these a bit, I've noticed that the surrogate key wasn't required, and therefore this patch. We have a streaming job that reads from many queue's, and writes this data to bigquery. Reducing the number of steps, makes the graph smaller, and allows us to run pipelines within a single job. Instead of combining the elements in a single operator, using the ArrayListMultimap, I think it is aesthetically nicer to use the Beam primitives and combine on an actual key instead of a synthetic one. I fully agree that we should not break the update compatibility if not strictly required, but it would be great if we can combine this with #14139. I've cherry-picked the commit into my branch. I'll run this into our test environment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
