[jira] [Commented] (SAMZA-459) Explicit flush for individual output streams

Chris Riccomini (JIRA) Thu, 06 Nov 2014 07:59:36 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200359#comment-14200359
 ]


Chris Riccomini commented on SAMZA-459:
---------------------------------------

bq. TaskCoordinator.flush(systemStream)

I like this API.

bq. It looks like the TaskCoordinator normally only queues up work, instead of 
doing it synchronously – if that's the case, it should be enough to buffer up 
all the requested flushes, then perform them in order when the moment comes.

I think we should make the flush() call synchronous. We switched to this 
paradigm with collector.send() (and TaskInstanceCollector) as well. My 
reasoning is that there might be logic within a single process call that is 
dependent on synchronously flushing (e.g. forcibly flushing to a stream before 
writing to a remote DB).

> Explicit flush for individual output streams
> --------------------------------------------
>
>                 Key: SAMZA-459
>                 URL: https://issues.apache.org/jira/browse/SAMZA-459
>             Project: Samza
>          Issue Type: Improvement
>          Components: container
>    Affects Versions: 0.9.0
>            Reporter: Ben Kirwin
>            Priority: Minor
>
> From the mailing list:
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D8-CS7867ob47fqytCAdvGURc4owv82Rhg2oEJYmr8hpg%40mail.gmail.com%3E
> At the moment, the only way to trigger a flush of the output streams is to 
> call TaskCoordinator.commit, which also flushes the state and saves the 
> checkpoints. There are a few cases where more granularity would be useful: 
> writing out a single stream can be much faster than doing a full commit, and 
> if a user cares about the order in which messages are published, they can 
> disable the autocommit and trigger flushes manually.
>  I'd anticipate this to look something like 
> TaskCoordinator.flush(systemStream). It looks like the TaskCoordinator 
> normally only queues up work, instead of doing it synchronously -- if that's 
> the case, it should be enough to buffer up all the requested flushes, then 
> perform them in order when the moment comes.
> Note: you could get *almost* the same effect by switching to a synchronous 
> system and letting the user send a batch of messages all at once, much as the 
> underlying Kafka client does. This woudn't let you flush a changelog stream, 
> though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-459) Explicit flush for individual output streams

Reply via email to