[
https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488250#comment-15488250
]
Hai commented on SAMZA-967:
---------------------------
> In this case there is no ordering among these files. Let's imaging, instead
> of writing to HDFS, we write to Kafka, then you also have no ordering within
> the samza topic partition when the events are coming from different upstream
> producers.
>> Ok. Let's say my repartitioner writes to a partition directory. If there is
>> no implicit ordering defined in the output itself, how does a downstream
>> HDFS consumer guarantee deterministic consumption? That is what I am not
>> clear about.
>>> You brought up a good point. There is no guarantee for deterministic
>>> consumption if repartitioning happens. But I think my point is that we are
>>> not able to solve this problem for Kafka either. Let's say we do
>>> repartitioning for a job that reads from Kafka and writes to Kafka, how do
>>> you guarantee consistent result, now? Well, you could argue that
>>> deterministic repartitioning result is not needed in the case of Kafka - a
>>> stream processing job, but is relevant in HDFS - essentially a batch
>>> processing job. I have to admit that I don't have a good solution to your
>>> question as of now:(
> Add HDFS system consumer to Samza
> ---------------------------------
>
> Key: SAMZA-967
> URL: https://issues.apache.org/jira/browse/SAMZA-967
> Project: Samza
> Issue Type: Sub-task
> Reporter: Hai
> Assignee: Hai
> Fix For: 0.12.0
>
> Attachments: HDFSSystemConsumer.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)