johnyangk commented on a change in pull request #153: [NEMO-245,247] Handle
watermark in OutputWriter and Implement unbounded word count example
URL: https://github.com/apache/incubator-nemo/pull/153#discussion_r232158185
##########
File path:
examples/beam/src/main/java/org/apache/nemo/examples/beam/WindowedWordCount.java
##########
@@ -41,19 +43,64 @@
private WindowedWordCount() {
}
+ public static final String INPUT_TYPE_BOUNDED = "bounded";
+ public static final String INPUT_TYPE_UNBOUNDED = "unbounded";
+
+
+ private static PCollection<KV<String, Long>> getSource(
+ final Pipeline p,
+ final String[] args) {
+
+ final String inputType = args[2];
+ if (inputType.compareTo(INPUT_TYPE_BOUNDED) == 0) {
+ final String inputFilePath = args[3];
+ return GenericSourceSink.read(p, inputFilePath)
+ .apply(ParDo.of(new DoFn<String, String>() {
+ @ProcessElement
+ public void processElement(@Element final String elem,
+ final OutputReceiver<String> out) {
+ final String[] splitt = elem.split("!");
+ out.outputWithTimestamp(splitt[0], new
Instant(Long.valueOf(splitt[1])));
+ }
+ }))
+ .apply(MapElements.<String, KV<String, Long>>via(new
SimpleFunction<String, KV<String, Long>>() {
+ @Override
+ public KV<String, Long> apply(final String line) {
+ final String[] words = line.split(" +");
+ final String documentId = words[0] + "#" + words[1];
+ final Long count = Long.parseLong(words[2]);
+ return KV.of(documentId, count);
+ }
+ }));
+ } else if (inputType.compareTo(INPUT_TYPE_UNBOUNDED) == 0) {
+ // unbounded
+ return p.apply(GenerateSequence
+ .from(1)
+ .withRate(2, Duration.standardSeconds(1))
Review comment:
Can you mention Nemo's watermark emission rate here?
(e.g., we set this number high enough to provide enough time for Nemo to
emit watermarks.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services