Looking further into the StreamingTransformTranslator, I would like to pose a question. Why do we do the groupByKey followed by the updateStateByKey? It appears to be a giant waste in which we convert everything to bytes and back unnecessarily.
The only thing it does is gather all the values for a key into an Iterable, but the updateStateByKey would also do that if it were given the chance. If we were to update the UpdateStateByKeyFunction to expect WindowedValue<V>'s instead of Iterable<WindowedValue<V>>'s I believe we could eliminate the call to groupByKey. What is happening now is the updateStateByKey will wrap those values in a Seq and so currently we have either an empty Seq or a Seq with exactly 1 item and that item is itself an Iterable that contains multiple items. **UPDATE: I have created a separate jira to look into this. [BEAM-5519](https://jira.apache.org/jira/browse/BEAM-5519)** [ Full content available at: https://github.com/apache/beam/pull/6181 ] This message was relayed via gitbox.apache.org for [email protected]
