[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597424#comment-14597424
]
Bogdan Ghit commented on SPARK-6112:
------------------------------------
Thanks for your comments. I use SPARK-6112 and hadoop-2.7.
$ hdfs storagepolicies -getStoragePolicy -path /tmp/spark-dfs
BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK],
creationFallbacks=[DISK], replicationFallbacks=[DISK]}
I am basically trying to read some data from HDFS (disk) and then write it back
to RAM.
So, you may run ./bin/spark-shell.sh and then try:
> val s = sc.textFile(input-path-in-hdfs)
> s.saveAsTextFile("/tmp/spark-dfs/output-path-in-ram")
The data goes to /local/bghit/myhdfs instead of [RAM_DISK]/dev/shm/ramdisk (see
my previous comment).
> Provide external block store support through HDFS RAM_DISK
> ----------------------------------------------------------
>
> Key: SPARK-6112
> URL: https://issues.apache.org/jira/browse/SPARK-6112
> Project: Spark
> Issue Type: New Feature
> Components: Block Manager
> Reporter: Zhan Zhang
> Attachments: SparkOffheapsupportbyHDFS.pdf
>
>
> HDFS Lazy_Persist policy provide possibility to cache the RDD off_heap in
> hdfs. We may want to provide similar capacity to Tachyon by leveraging hdfs
> RAM_DISK feature, if the user environment does not have tachyon deployed.
> With this feature, it potentially provides possibility to share RDD in memory
> across different jobs and even share with jobs other than spark, and avoid
> the RDD recomputation if executors crash.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]