johnyangk commented on a change in pull request #135: [NEMO-230] Emit collected
data when receiving watermark in GroupByKeyAndWindowTransform
URL: https://github.com/apache/incubator-nemo/pull/135#discussion_r229178956
##########
File path:
compiler/frontend/beam/src/main/java/org/apache/nemo/compiler/frontend/beam/transform/GroupByKeyAndWindowDoFnTransform.java
##########
@@ -76,30 +77,59 @@ public GroupByKeyAndWindowDoFnTransform(final
Map<TupleTag<?>, Coder<?>> outputC
*/
@Override
protected DoFn wrapDoFn(final DoFn doFn) {
- timerInternalsFactory = new InMemoryTimerInternalsFactory();
+ this.stateAndTimerInternalsFactory = new StateAndTimerInternalsFactory();
// This function performs group by key and window operation
return
GroupAlsoByWindowViaWindowSetNewDoFn.create(
getWindowingStrategy(),
- new InMemoryStateInternalsFactory(),
- timerInternalsFactory,
+ stateAndTimerInternalsFactory.inMemoryStateInternalsFactory,
+ stateAndTimerInternalsFactory.inMemoryTimerInternalsFactory,
getSideInputReader(),
reduceFn,
getOutputManager(),
getMainOutputTag());
}
+ /**
+ * It collects data for each key.
+ * The collected data are emitted at {@link
GroupByKeyAndWindowDoFnTransform#onWatermark(Watermark)}
+ * @param element data element
+ */
@Override
public void onData(final WindowedValue<KV<K, InputT>> element) {
final KV<K, InputT> kv = element.getValue();
- keyToValues.putIfAbsent(kv.getKey(), new ArrayList());
+ keyToValues.putIfAbsent(kv.getKey(), new LinkedList<>());
keyToValues.get(kv.getKey()).add(element.withValue(kv.getValue()));
Review comment:
Can you add a comment why we do not do `getDoFnRunner().processElement` here?
I'm wondering if `GroupAlsoByWindowViaWindowSetNewDoFn` internally stores
the elements it has seen when its `processElement` is called, similar to the
`keyToValues` variable we're using in this class.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services