[ 
https://issues.apache.org/jira/browse/SAMZA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388711#comment-14388711
 ] 

Vinoth Chandar commented on SAMZA-622:
--------------------------------------

[~cpsoman] can you route the ticket? Made the mistake of cloning jacob's ticket 
and unable to change the assignee. 

> Persisting Samza State on HDFS
> ------------------------------
>
>                 Key: SAMZA-622
>                 URL: https://issues.apache.org/jira/browse/SAMZA-622
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Vinoth Chandar
>            Assignee: Jakob Homan
>
> Samza's state currently lives in Kafka as a change log (compacted) and local 
> rocksdb kv store.. 
> It would be nice to save this onto HDFS directly for the following reasons 
> - HDFS is a fault tolerant FS. Thus, restarting Samza tasks can be achieved 
> by locating the task to where the other copies are.
> - HDFS virtualizes storage and thus, one would not have to worry explicitly 
> about balancing disk usage across different tiers (I don't know what the 
> right word is) in a data flow graph
> - Storing the state in HDFS, makes it easier to share this with other 
> processing systems in the Hadoop land. 
> Rocksdb seems to have an option to store files onto HDFS 
> https://github.com/facebook/rocksdb/tree/master/hdfs (Has someone played with 
> this). 
> Context: I am working on producing compacted DB snapshots on HDFS for 
> spark/MR jobs to use and thus super interested in this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to