[ 
https://issues.apache.org/jira/browse/SAMZA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326293#comment-14326293
 ] 

Chris Riccomini commented on SAMZA-569:
---------------------------------------

An alternative to enforcing this would be to allow SystemAdmins to provide an 
OffsetComparator. If the system can provide ordered offsets, then the 
comparator would work. If it couldn't then Samza would have to degrade to not 
support ordered offsets.

A second way around this for [~nickpan47]'s immediate use case is if you always 
punctuate offsets along with the current timestamp in a window. If this is done 
then you don't need to do any de-duplication, since you always pick-up reading 
at a window boundary.

A third way around this is if Kafka actually had [atomic/transactional 
commits|https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka],
 which has been worked on off-and-on for the past year or so.

> Make message offsets ordered set within a system stream partition
> -----------------------------------------------------------------
>
>                 Key: SAMZA-569
>                 URL: https://issues.apache.org/jira/browse/SAMZA-569
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Yi Pan (Data Infrastructure)
>
> It would be nice to make message offsets as an ordered set within a system 
> stream partition. I.e. if message offsets from the same partition is 
> monotonically increasing according to the order that messages are delivered.
> It would provide the following two features:
> * de-dup w/o the need to keep all message offsets
> * determinism when re-calculating the output from a buffered set of messages
> As for now, w/o the ordering between the message offsets, it would require 
> the following implementation in window operator to make sure de-dup and 
> determinism:
> * keep all message offsets ever seen in persist storage if want to dedup with 
> arbitrary length of replay of messages; Or keep all message offsets within a 
> window if dedup just within a window length
> * keep the insertion order of messages in buffer, which potentially also 
> requires persist KV store support that also keeps insertion order in the store
> Both seem complicated and are not needed if we have ordering between message 
> offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to