Yi Pan (Data Infrastructure) created SAMZA-569:
--------------------------------------------------

             Summary: Make message offsets ordered set within a system stream 
partition
                 Key: SAMZA-569
                 URL: https://issues.apache.org/jira/browse/SAMZA-569
             Project: Samza
          Issue Type: Improvement
            Reporter: Yi Pan (Data Infrastructure)


It would be nice to make message offsets as an ordered set within a system 
stream partition. I.e. if message offsets from the same partition is 
monotonically increasing according to the order that messages are delivered.

It would provide the following two features:
* de-dup w/o the need to keep all message offsets
* determinism when re-calculating the output from a buffered set of messages

As for now, w/o the ordering between the message offsets, it would require the 
following implementation in window operator to make sure de-dup and determinism:
* keep all message offsets ever seen in persist storage if want to dedup with 
arbitrary length of replay of messages; Or keep all message offsets within a 
window if dedup just within a window length
* keep the insertion order of messages in buffer, which potentially also 
requires persist KV store support that also requires keeping insertion order in 
the store

Both seem complicated and are not needed if we have ordering between message 
offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to