[
https://issues.apache.org/jira/browse/SAMZA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327825#comment-14327825
]
Yi Pan (Data Infrastructure) commented on SAMZA-569:
----------------------------------------------------
[~martinkl], thanks for commenting on the offset cases. The OffsetComparator
should work. The only burden is for each system we support, there is some
additional code to write for the comparator. IMHO, the addition code is worth
to consider, given the power that ordered offsets can give us.
> Make message offsets ordered set within a system stream partition
> -----------------------------------------------------------------
>
> Key: SAMZA-569
> URL: https://issues.apache.org/jira/browse/SAMZA-569
> Project: Samza
> Issue Type: Improvement
> Components: container
> Reporter: Yi Pan (Data Infrastructure)
>
> It would be nice to make message offsets as an ordered set within a system
> stream partition. I.e. if message offsets from the same partition is
> monotonically increasing according to the order that messages are delivered.
> It would provide the following two features:
> * de-dup w/o the need to keep all message offsets
> * determinism when re-calculating the output from a buffered set of messages
> As for now, w/o the ordering between the message offsets, it would require
> the following implementation in window operator to make sure de-dup and
> determinism:
> * keep all message offsets ever seen in persist storage if want to dedup with
> arbitrary length of replay of messages; Or keep all message offsets within a
> window if dedup just within a window length
> * keep the insertion order of messages in buffer, which potentially also
> requires persist KV store support that also keeps insertion order in the store
> Both seem complicated and are not needed if we have ordering between message
> offsets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)