[ 
https://issues.apache.org/jira/browse/SAMZA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487591#comment-14487591
 ] 

Vinoth Chandar commented on SAMZA-622:
--------------------------------------

Sorry for the delayed response. While I empathize with not shoehorning things 
into a framework, the practical reality of building systems is that sometimes 
they offer different guarantees when setup differently.  

>> This approach is fine for a shot term hacky approach.
I feel these discussions are premature. Let me do more groundwork, put up a 
proposal. Then we can discuss in detail about what guarantees are met, what are 
n't .. Then we can make a call.. I think most of what you mentioned is captured 
in : https://issues.apache.org/jira/browse/SAMZA-72 (and its out links). I will 
take into account these. 

> Persisting Samza State on HDFS
> ------------------------------
>
>                 Key: SAMZA-622
>                 URL: https://issues.apache.org/jira/browse/SAMZA-622
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>
> Samza's state currently lives in Kafka as a change log (compacted) and local 
> rocksdb kv store.. 
> It would be nice to save this onto HDFS directly for the following reasons 
> - HDFS is a fault tolerant FS. Thus, restarting Samza tasks can be achieved 
> by locating the task to where the other copies are.
> - HDFS virtualizes storage and thus, one would not have to worry explicitly 
> about balancing disk usage across different tiers (I don't know what the 
> right word is) in a data flow graph
> - Storing the state in HDFS, makes it easier to share this with other 
> processing systems in the Hadoop land. 
> Rocksdb seems to have an option to store files onto HDFS 
> https://github.com/facebook/rocksdb/tree/master/hdfs (Has someone played with 
> this). 
> Context: I am working on producing compacted DB snapshots on HDFS for 
> spark/MR jobs to use and thus super interested in this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to