[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596695#comment-14596695
]
Arpit Agarwal commented on SPARK-6112:
--------------------------------------
bq. The amount of memory I can lock is set in /etc/security/limits.conf to
unlimited, so ulimit -l outputs "unlimited". However, I get the exception
"Cannot start datanode because the configured max locked memory size
(dfs.datanode.max.locked.memory) is greater than zero and native code is not
available." Any ideas why?
Hi [~bghit], which platform+Hadoop distribution are you using? Could you check
if you have native IO enabled with {{hadoop checknative}}? You should see
something like this:
{code}
$ bin/hadoop checknative
15/06/22 14:28:24 INFO bzip2.Bzip2Factory: Successfully loaded & initialized
native-bzip2 library system-native
15/06/22 14:28:24 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
Native library checking:
hadoop: true /var/lib/native/libhadoop.so.1.0.0
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so
{code}
Pre-Apache Hadoop 2.8.0, we don't enforce {{dfs.datanode.max.locked.memory}}
for Lazy Persist so you can skip it for testing/validation. But it's a good
idea to set it now so you don't run into failures when you upgrade Hadoop later.
bq. When I write the output of an application in Spark with
saveAsTextFile("/tmp/spark-dfs/output"), the data goes to disk.
If the storage policy is correct on /tmp/spark-dfs writes will go to RAM disk.
HDFS will temporarily fall back to disk writes if there is unsaved data in RAM
disk to avoid data loss. I know little about using Spark but if you can share
repro steps for the {{saveAsTextFile}} issue I can try it out. Thanks.
> Provide external block store support through HDFS RAM_DISK
> ----------------------------------------------------------
>
> Key: SPARK-6112
> URL: https://issues.apache.org/jira/browse/SPARK-6112
> Project: Spark
> Issue Type: New Feature
> Components: Block Manager
> Reporter: Zhan Zhang
> Attachments: SparkOffheapsupportbyHDFS.pdf
>
>
> HDFS Lazy_Persist policy provide possibility to cache the RDD off_heap in
> hdfs. We may want to provide similar capacity to Tachyon by leveraging hdfs
> RAM_DISK feature, if the user environment does not have tachyon deployed.
> With this feature, it potentially provides possibility to share RDD in memory
> across different jobs and even share with jobs other than spark, and avoid
> the RDD recomputation if executors crash.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]