Looking further into the StreamingTransformTranslator, I would like to pose a 
question. Why do we do the groupByKey followed by the updateStateByKey? It 
appears to be a giant waste in which we convert everything to bytes and back 
unnecessarily.

The only thing it does is gather all the values for a key into an Iterable, but 
the updateStateByKey would also do that if it were given the chance.

If we were to update the UpdateStateByKeyFunction to expect WindowedValue<V>'s 
instead of Iterable<WindowedValue<V>>'s I believe we could eliminate the call 
to groupByKey. What is happening now is the updateStateByKey will wrap those 
values in a Seq and so currently we have either an empty Seq or a Seq with 
exactly 1 item and that item is itself an Iterable that contains multiple items.

[ Full content available at: https://github.com/apache/beam/pull/6181 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to