Ben Kirwin created SAMZA-459:
--------------------------------

             Summary: Explicit flush for individual output streams
                 Key: SAMZA-459
                 URL: https://issues.apache.org/jira/browse/SAMZA-459
             Project: Samza
          Issue Type: Improvement
            Reporter: Ben Kirwin
            Priority: Minor


>From the mailing list:

http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D8-CS7867ob47fqytCAdvGURc4owv82Rhg2oEJYmr8hpg%40mail.gmail.com%3E

At the moment, the only way to trigger a flush of the output streams is to call 
TaskCoordinator.commit, which also flushes the state and saves the checkpoints. 
There are a few cases where more granularity would be useful: writing out a 
single stream can be much faster than doing a full commit, and if a user cares 
about the order in which messages are published, they can disable the 
autocommit and trigger flushes manually.

 I'd anticipate this to look something like 
TaskCoordinator.flush(systemStream). It looks like the TaskCoordinator normally 
only queues up work, instead of doing it synchronously -- if that's the case, 
it should be enough to buffer up all the requested flushes, then perform them 
in order when the moment comes.

Note: you could get *almost* the same effect by switching to a synchronous 
system and letting the user send a batch of messages all at once, much as the 
underlying Kafka client does. This woudn't let you flush a changelog stream, 
though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to