[ 
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596695#comment-14596695
 ] 

Arpit Agarwal commented on SPARK-6112:
--------------------------------------

bq. The amount of memory I can lock is set in /etc/security/limits.conf to 
unlimited, so ulimit -l outputs "unlimited". However, I get the exception 
"Cannot start datanode because the configured max locked memory size 
(dfs.datanode.max.locked.memory) is greater than zero and native code is not 
available." Any ideas why?
Hi [~bghit], which platform+Hadoop distribution are you using? Could you check 
if you have native IO enabled with {{hadoop checknative}}? You should see 
something like this:
{code}
$ bin/hadoop checknative
15/06/22 14:28:24 INFO bzip2.Bzip2Factory: Successfully loaded & initialized 
native-bzip2 library system-native
15/06/22 14:28:24 INFO zlib.ZlibFactory: Successfully loaded & initialized 
native-zlib library
Native library checking:
hadoop:  true /var/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
snappy:  true /usr/lib/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so
{code}

Pre-Apache Hadoop 2.8.0, we don't enforce {{dfs.datanode.max.locked.memory}} 
for Lazy Persist so you can skip it for testing/validation. But it's a good 
idea to set it now so you don't run into failures when you upgrade Hadoop later.

bq. When I write the output of an application in Spark with 
saveAsTextFile("/tmp/spark-dfs/output"), the data goes to disk.
If the storage policy is correct on /tmp/spark-dfs writes will go to RAM disk. 
HDFS will temporarily fall back to disk writes if there is unsaved data in RAM 
disk to avoid data loss. I know little about using Spark but if you can share 
repro steps for the {{saveAsTextFile}} issue I can try it out. Thanks.

> Provide external block store support through HDFS RAM_DISK
> ----------------------------------------------------------
>
>                 Key: SPARK-6112
>                 URL: https://issues.apache.org/jira/browse/SPARK-6112
>             Project: Spark
>          Issue Type: New Feature
>          Components: Block Manager
>            Reporter: Zhan Zhang
>         Attachments: SparkOffheapsupportbyHDFS.pdf
>
>
> HDFS Lazy_Persist policy provide possibility to cache the RDD off_heap in 
> hdfs. We may want to provide similar capacity to Tachyon by leveraging hdfs 
> RAM_DISK feature, if the user environment does not have tachyon deployed. 
> With this feature, it potentially provides possibility to share RDD in memory 
> across different jobs and even share with jobs other than spark, and avoid 
> the RDD recomputation if executors crash. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to