[
https://issues.apache.org/jira/browse/FLINK-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139005#comment-15139005
]
Shikhar Bhushan commented on FLINK-3375:
----------------------------------------
This would be super useful for me, as I currently have to unnecessarily use a
parallelism of 30 since there are 30 partitions, when even parallelism=1 would
suffice and works more efficiently.
It would be great if the existing {{TimestampExtractor}} interface can be
supported, or any other interface to allow for watermark to be determined in a
different way than simply ascending -- in my case, the timestamps on a
partition should be mostly ascending but the messages are produced from
different machines so I need to account for small inconsistencies in their
system clocks. Currently using this extractor:
https://gist.github.com/shikhar/2d9306e2ebd8ca89728c
> Allow Watermark Generation in the Kafka Source
> ----------------------------------------------
>
> Key: FLINK-3375
> URL: https://issues.apache.org/jira/browse/FLINK-3375
> Project: Flink
> Issue Type: Improvement
> Components: Kafka Connector
> Affects Versions: 1.0.0
> Reporter: Stephan Ewen
> Fix For: 1.0.0
>
>
> It is a common case that event timestamps are ascending inside one Kafka
> Partition. Ascending timestamps are easy for users, because they are handles
> by ascending timestamp extraction.
> If the Kafka source has multiple partitions per source task, then the records
> become out of order before timestamps can be extracted and watermarks can be
> generated.
> If we make the FlinkKafkaConsumer an event time source function, it can
> generate watermarks itself. It would internally implement the same logic as
> the regular operators that merge streams, keeping track of event time
> progress per partition and generating watermarks based on the current
> guaranteed event time progress.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)