[ 
https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447060#comment-15447060
 ] 

Navina Ramesh commented on SAMZA-967:
-------------------------------------

[~lhaiesp] Sorry for the delay in my review. I strongly urge you to post a 
\[DISCUSS\] or \[RFC\] email in the dev mailing list to get more eyes on your 
work and potentially, more feedback. 

Overall, the design document looks good. I have couple of questions:
* Is the “End of Stream” feature a pre-requisite for HDFS consumer? If yes, 
link the corresponding JIRA and design document. Providing a high-level 
description of how that feature will be leveraged for solving this problem will 
layout more ground-work for readers who are not familiar about this
* One of the goals and non-goals are slightly overlapping. "(Goal) The system 
consumer should support a variety of folder structures and filename 
conventions" and "(Non-Goal) Support ALL kinds of HDFS folder structures and 
filename formats" . Can you specifically call out which structure and 
conventions you are supporting or call out which ones you are not supporting? 
Just to more clarity to the document.
* Along with the 3rd point under Assumptions, you can call out "write-once, 
read-many" as the underlying usage pattern. 
* What does the whitelist and blacklist here consists of ? Why do we need both 
? Can you provide example of how this config will look like?
* In case of repartitioner, multiple samza tasks cannot write to the same file. 
Hence, each task can write in a separate file within the partition directory -> 
what defines the ordering among these files when the downstream job is 
consuming ? is it based on timestamp?
* when does the HDFSSystemAdmin write the PartitionDescriptor to HDFS?? Is it 
done by the job coordinator or by each container? 
* Is the PartitionDescriptor file expected to follow any convention? Or is it 
simply going to contain a map? 

Cheers! 

PS: I am looking at your RB now :) 





> Add HDFS system consumer to Samza
> ---------------------------------
>
>                 Key: SAMZA-967
>                 URL: https://issues.apache.org/jira/browse/SAMZA-967
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Hai
>            Assignee: Hai
>             Fix For: 0.12.0
>
>         Attachments: HDFSSystemConsumer.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to