[
https://issues.apache.org/jira/browse/SAMZA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391947#comment-14391947
]
Vinoth Chandar commented on SAMZA-622:
--------------------------------------
That would defeat the purpose of integrating with HDFS to begin with. The real
benefit of using HDFS as the state store is that, we would not replicate this
to Kafka anymore .. you can move your samza task to one of the data nodes that
has a copy of the data, upon failure of that data node, you can simply move it
another available data node and you have state there already (since HDFS does
the block level replication automatically). This is what Spark is doing and
what MR has been doing for years. Nothing new.
Whether the indexing structure should be LSM or should its implementation be
rocksdb is an orthogonal issue.
> Persisting Samza State on HDFS
> ------------------------------
>
> Key: SAMZA-622
> URL: https://issues.apache.org/jira/browse/SAMZA-622
> Project: Samza
> Issue Type: Improvement
> Reporter: Vinoth Chandar
> Assignee: Vinoth Chandar
>
> Samza's state currently lives in Kafka as a change log (compacted) and local
> rocksdb kv store..
> It would be nice to save this onto HDFS directly for the following reasons
> - HDFS is a fault tolerant FS. Thus, restarting Samza tasks can be achieved
> by locating the task to where the other copies are.
> - HDFS virtualizes storage and thus, one would not have to worry explicitly
> about balancing disk usage across different tiers (I don't know what the
> right word is) in a data flow graph
> - Storing the state in HDFS, makes it easier to share this with other
> processing systems in the Hadoop land.
> Rocksdb seems to have an option to store files onto HDFS
> https://github.com/facebook/rocksdb/tree/master/hdfs (Has someone played with
> this).
> Context: I am working on producing compacted DB snapshots on HDFS for
> spark/MR jobs to use and thus super interested in this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)