Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1723#issuecomment-189459860
I like this, especially the natural Scala path. Some concerns, though:
1. As noted before, the fold function in the non-keyed window is not
space constant. That is quite a serious issue, if people count heavy streams in
large windows. That should be fixed.
2. The `reduce(...)` code on the `WindowedStream` creates a pass-through
window function over Iterable, instead of an pass-through internal window
function. The result is that for reduced windows, the single result value is
wrapped and unwrapped, instead of simply passed through. That should be a quite
simple fix.
3. Minor comment, more for future code, (and that applies to virtually
all Scala API code): The code creates function objects in line a lot. As a
result, we have for example at least 10 classes that implement a Flink
ReduceFunction and call a Scala Function2. I would suggest to define that class
once externally (not inline), because it reduces number of subclasses of the
reduce function and that may actually have a positive impact on the JIT
friendliness of some programs (if the number of loaded implementations of
ReduceFunction at any point becomes reasonably small)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---