[ https://issues.apache.org/jira/browse/KAFKA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018071#comment-16018071 ]
ASF GitHub Bot commented on KAFKA-4785: --------------------------------------- GitHub user jeyhunkarimov opened a pull request: https://github.com/apache/kafka/pull/3106 KAFKA-4785: Records from internal repartitioning topics should always use RecordMetadataTimestampExtractor You can merge this pull request into a Git repository by running: $ git pull https://github.com/jeyhunkarimov/kafka KAFKA-4785 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3106 ---- commit 98278baae63638efc2b750b1d3f12e936d845f18 Author: Jeyhun Karimov <je.kari...@gmail.com> Date: 2017-05-19T21:39:50Z Records from internal repartitioning topics should always use RecordMetadataTimestampExtractor ---- > Records from internal repartitioning topics should always use > RecordMetadataTimestampExtractor > ---------------------------------------------------------------------------------------------- > > Key: KAFKA-4785 > URL: https://issues.apache.org/jira/browse/KAFKA-4785 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.0 > Reporter: Matthias J. Sax > Assignee: Jeyhun Karimov > > Users can specify what timestamp extractor should be used to decode the > timestamp of input topic records. As long as RecordMetadataTimestamp or > WallclockTime is use this is fine. > However, for custom timestamp extractors it might be invalid to apply this > custom extractor to records received from internal repartitioning topics. The > reason is that Streams sets the current "stream time" as record metadata > timestamp explicitly before writing to intermediate repartitioning topics > because this timestamp should be use by downstream subtopologies. A custom > timestamp extractor might return something different breaking this assumption. > Thus, for reading data from intermediate repartitioning topic, the configured > timestamp extractor should not be used, but the record's metadata timestamp > should be extracted as record timestamp. > In order to leverage the same behavior for intermediate user topic (ie, used > in {{through()}}) we can leverage KAFKA-4144 and internally set an extractor > for those "intermediate sources" that returns the record's metadata timestamp > in order to overwrite the global extractor from {{StreamsConfig}} (ie, set > {{FailOnInvalidTimestampExtractor}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)