Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Michael, I think that is what I am trying to show here as the documentation mentions "NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager." So, in a way I am supporting your statement :) Regards, Gourav On Wed, Mar 28, 2018 at 10:00 AM, Michael Shtelmawrote: > Hi, > > this property will be used in YARN mode only by the driver. > Executors will use the properties coming from YARN for storing temporary > files. > > > Best, > Michael > > On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> Hi, >> >> >> As per documentation in: https://spark.apache.org/d >> ocs/latest/configuration.html >> >> >> spark.local.dir /tmp Directory to use for "scratch" space in Spark, >> including map output files and RDDs that get stored on disk. This should be >> on a fast, local disk in your system. It can also be a comma-separated list >> of multiple directories on different disks. NOTE: In Spark 1.0 and later >> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or >> LOCAL_DIRS (YARN) environment variables set by the cluster manager. >> >> Regards, >> Gourav Sengupta >> >> >> >> >> >> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma >> wrote: >> >>> Hi Keith, >>> >>> Thanks for the suggestion! >>> I have solved this already. >>> The problem was, that the yarn process was not responding to >>> start/stop commands and has not applied my configuration changes. >>> I have killed it and restarted my cluster, and after that yarn has >>> started using yarn.nodemanager.local-dirs parameter defined in >>> yarn-site.xml. >>> After this change, -Djava.io.tmpdir for the spark executor was set >>> correctly, according to yarn.nodemanager.local-dirs parameter. >>> >>> Best, >>> Michael >>> >>> >>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman >>> wrote: >>> > Hi Michael, >>> > >>> > sorry for the late reply. I guess you may have to set it through the >>> hdfs >>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir" >>> which >>> > defaults to "/tmp/hadoop-${user.name}" >>> > >>> > Regards, >>> > Keith. >>> > >>> > http://keith-chapman.com >>> > >>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma >>> wrote: >>> >> >>> >> Hi Keith, >>> >> >>> >> Thank you for the idea! >>> >> I have tried it, so now the executor command is looking in the >>> following >>> >> way : >>> >> >>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m >>> >> '-Djava.io.tmpdir=my_prefered_path' >>> >> >>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/ >>> appcache/application_1521110306769_0041/container_1521110306 >>> 769_0041_01_04/tmp >>> >> >>> >> JVM is using the second Djava.io.tmpdir parameter and writing >>> >> everything to the same directory as before. >>> >> >>> >> Best, >>> >> Michael >>> >> Sincerely, >>> >> Michael Shtelma >>> >> >>> >> >>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman < >>> keithgchap...@gmail.com> >>> >> wrote: >>> >> > Can you try setting spark.executor.extraJavaOptions to have >>> >> > -Djava.io.tmpdir=someValue >>> >> > >>> >> > Regards, >>> >> > Keith. >>> >> > >>> >> > http://keith-chapman.com >>> >> > >>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma < >>> mshte...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Hi Keith, >>> >> >> >>> >> >> Thank you for your answer! >>> >> >> I have done this, and it is working for spark driver. >>> >> >> I would like to make something like this for the executors as >>> well, so >>> >> >> that the setting will be used on all the nodes, where I have >>> executors >>> >> >> running. >>> >> >> >>> >> >> Best, >>> >> >> Michael >>> >> >> >>> >> >> >>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman >>> >> >> >>> >> >> wrote: >>> >> >> > Hi Michael, >>> >> >> > >>> >> >> > You could either set spark.local.dir through spark conf or >>> >> >> > java.io.tmpdir >>> >> >> > system property. >>> >> >> > >>> >> >> > Regards, >>> >> >> > Keith. >>> >> >> > >>> >> >> > http://keith-chapman.com >>> >> >> > >>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma < >>> mshte...@gmail.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi everybody, >>> >> >> >> >>> >> >> >> I am running spark job on yarn, and my problem is that the >>> >> >> >> blockmgr-* >>> >> >> >> folders are being created under >>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat >>> ion_id/* >>> >> >> >> The size of this folder can grow to a significant size and does >>> not >>> >> >> >> really fit into /tmp file system for one job, which makes a real >>> >> >> >> problem for my installation. >>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and >>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >>> >> >> >> location and
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi, this property will be used in YARN mode only by the driver. Executors will use the properties coming from YARN for storing temporary files. Best, Michael On Wed, Mar 28, 2018 at 7:37 AM, Gourav Senguptawrote: > Hi, > > > As per documentation in: https://spark.apache.org/ > docs/latest/configuration.html > > > spark.local.dir /tmp Directory to use for "scratch" space in Spark, > including map output files and RDDs that get stored on disk. This should be > on a fast, local disk in your system. It can also be a comma-separated list > of multiple directories on different disks. NOTE: In Spark 1.0 and later > this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or > LOCAL_DIRS (YARN) environment variables set by the cluster manager. > > Regards, > Gourav Sengupta > > > > > > On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma > wrote: > >> Hi Keith, >> >> Thanks for the suggestion! >> I have solved this already. >> The problem was, that the yarn process was not responding to >> start/stop commands and has not applied my configuration changes. >> I have killed it and restarted my cluster, and after that yarn has >> started using yarn.nodemanager.local-dirs parameter defined in >> yarn-site.xml. >> After this change, -Djava.io.tmpdir for the spark executor was set >> correctly, according to yarn.nodemanager.local-dirs parameter. >> >> Best, >> Michael >> >> >> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman >> wrote: >> > Hi Michael, >> > >> > sorry for the late reply. I guess you may have to set it through the >> hdfs >> > core-site.xml file. The property you need to set is "hadoop.tmp.dir" >> which >> > defaults to "/tmp/hadoop-${user.name}" >> > >> > Regards, >> > Keith. >> > >> > http://keith-chapman.com >> > >> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma >> wrote: >> >> >> >> Hi Keith, >> >> >> >> Thank you for the idea! >> >> I have tried it, so now the executor command is looking in the >> following >> >> way : >> >> >> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m >> >> '-Djava.io.tmpdir=my_prefered_path' >> >> >> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/ >> appcache/application_1521110306769_0041/container_1521110306 >> 769_0041_01_04/tmp >> >> >> >> JVM is using the second Djava.io.tmpdir parameter and writing >> >> everything to the same directory as before. >> >> >> >> Best, >> >> Michael >> >> Sincerely, >> >> Michael Shtelma >> >> >> >> >> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman < >> keithgchap...@gmail.com> >> >> wrote: >> >> > Can you try setting spark.executor.extraJavaOptions to have >> >> > -Djava.io.tmpdir=someValue >> >> > >> >> > Regards, >> >> > Keith. >> >> > >> >> > http://keith-chapman.com >> >> > >> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma < >> mshte...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi Keith, >> >> >> >> >> >> Thank you for your answer! >> >> >> I have done this, and it is working for spark driver. >> >> >> I would like to make something like this for the executors as well, >> so >> >> >> that the setting will be used on all the nodes, where I have >> executors >> >> >> running. >> >> >> >> >> >> Best, >> >> >> Michael >> >> >> >> >> >> >> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman >> >> >> >> >> >> wrote: >> >> >> > Hi Michael, >> >> >> > >> >> >> > You could either set spark.local.dir through spark conf or >> >> >> > java.io.tmpdir >> >> >> > system property. >> >> >> > >> >> >> > Regards, >> >> >> > Keith. >> >> >> > >> >> >> > http://keith-chapman.com >> >> >> > >> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma < >> mshte...@gmail.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi everybody, >> >> >> >> >> >> >> >> I am running spark job on yarn, and my problem is that the >> >> >> >> blockmgr-* >> >> >> >> folders are being created under >> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat >> ion_id/* >> >> >> >> The size of this folder can grow to a significant size and does >> not >> >> >> >> really fit into /tmp file system for one job, which makes a real >> >> >> >> problem for my installation. >> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and >> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >> >> >> >> location and expected that the block manager will create the >> files >> >> >> >> there and not under /tmp, but this is not the case. The files are >> >> >> >> created under /tmp. >> >> >> >> >> >> >> >> I am wondering if there is a way to make spark not use /tmp at >> all >> >> >> >> and >> >> >> >> configure it to create all the files somewhere else ? >> >> >> >> >> >> >> >> Any assistance would be greatly appreciated! >> >> >> >> >> >> >> >> Best, >> >> >> >> Michael >> >> >> >> >> >> >> >> >> >> >> >> >>
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi, As per documentation in: https://spark.apache.org/docs/latest/configuration.html spark.local.dir /tmp Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager. Regards, Gourav Sengupta On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelmawrote: > Hi Keith, > > Thanks for the suggestion! > I have solved this already. > The problem was, that the yarn process was not responding to > start/stop commands and has not applied my configuration changes. > I have killed it and restarted my cluster, and after that yarn has > started using yarn.nodemanager.local-dirs parameter defined in > yarn-site.xml. > After this change, -Djava.io.tmpdir for the spark executor was set > correctly, according to yarn.nodemanager.local-dirs parameter. > > Best, > Michael > > > On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman > wrote: > > Hi Michael, > > > > sorry for the late reply. I guess you may have to set it through the hdfs > > core-site.xml file. The property you need to set is "hadoop.tmp.dir" > which > > defaults to "/tmp/hadoop-${user.name}" > > > > Regards, > > Keith. > > > > http://keith-chapman.com > > > > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma > wrote: > >> > >> Hi Keith, > >> > >> Thank you for the idea! > >> I have tried it, so now the executor command is looking in the following > >> way : > >> > >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m > >> '-Djava.io.tmpdir=my_prefered_path' > >> > >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/ > msh/appcache/application_1521110306769_0041/container_ > 1521110306769_0041_01_04/tmp > >> > >> JVM is using the second Djava.io.tmpdir parameter and writing > >> everything to the same directory as before. > >> > >> Best, > >> Michael > >> Sincerely, > >> Michael Shtelma > >> > >> > >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman > > >> wrote: > >> > Can you try setting spark.executor.extraJavaOptions to have > >> > -Djava.io.tmpdir=someValue > >> > > >> > Regards, > >> > Keith. > >> > > >> > http://keith-chapman.com > >> > > >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma > > >> > wrote: > >> >> > >> >> Hi Keith, > >> >> > >> >> Thank you for your answer! > >> >> I have done this, and it is working for spark driver. > >> >> I would like to make something like this for the executors as well, > so > >> >> that the setting will be used on all the nodes, where I have > executors > >> >> running. > >> >> > >> >> Best, > >> >> Michael > >> >> > >> >> > >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman > >> >> > >> >> wrote: > >> >> > Hi Michael, > >> >> > > >> >> > You could either set spark.local.dir through spark conf or > >> >> > java.io.tmpdir > >> >> > system property. > >> >> > > >> >> > Regards, > >> >> > Keith. > >> >> > > >> >> > http://keith-chapman.com > >> >> > > >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma < > mshte...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> Hi everybody, > >> >> >> > >> >> >> I am running spark job on yarn, and my problem is that the > >> >> >> blockmgr-* > >> >> >> folders are being created under > >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/ > application_id/* > >> >> >> The size of this folder can grow to a significant size and does > not > >> >> >> really fit into /tmp file system for one job, which makes a real > >> >> >> problem for my installation. > >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and > >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other > >> >> >> location and expected that the block manager will create the files > >> >> >> there and not under /tmp, but this is not the case. The files are > >> >> >> created under /tmp. > >> >> >> > >> >> >> I am wondering if there is a way to make spark not use /tmp at all > >> >> >> and > >> >> >> configure it to create all the files somewhere else ? > >> >> >> > >> >> >> Any assistance would be greatly appreciated! > >> >> >> > >> >> >> Best, > >> >> >> Michael > >> >> >> > >> >> >> > >> >> >> > - > >> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> >> >> > >> >> > > >> > > >> > > > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Keith, Thanks for the suggestion! I have solved this already. The problem was, that the yarn process was not responding to start/stop commands and has not applied my configuration changes. I have killed it and restarted my cluster, and after that yarn has started using yarn.nodemanager.local-dirs parameter defined in yarn-site.xml. After this change, -Djava.io.tmpdir for the spark executor was set correctly, according to yarn.nodemanager.local-dirs parameter. Best, Michael On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapmanwrote: > Hi Michael, > > sorry for the late reply. I guess you may have to set it through the hdfs > core-site.xml file. The property you need to set is "hadoop.tmp.dir" which > defaults to "/tmp/hadoop-${user.name}" > > Regards, > Keith. > > http://keith-chapman.com > > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma wrote: >> >> Hi Keith, >> >> Thank you for the idea! >> I have tried it, so now the executor command is looking in the following >> way : >> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m >> '-Djava.io.tmpdir=my_prefered_path' >> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_04/tmp >> >> JVM is using the second Djava.io.tmpdir parameter and writing >> everything to the same directory as before. >> >> Best, >> Michael >> Sincerely, >> Michael Shtelma >> >> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman >> wrote: >> > Can you try setting spark.executor.extraJavaOptions to have >> > -Djava.io.tmpdir=someValue >> > >> > Regards, >> > Keith. >> > >> > http://keith-chapman.com >> > >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma >> > wrote: >> >> >> >> Hi Keith, >> >> >> >> Thank you for your answer! >> >> I have done this, and it is working for spark driver. >> >> I would like to make something like this for the executors as well, so >> >> that the setting will be used on all the nodes, where I have executors >> >> running. >> >> >> >> Best, >> >> Michael >> >> >> >> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman >> >> >> >> wrote: >> >> > Hi Michael, >> >> > >> >> > You could either set spark.local.dir through spark conf or >> >> > java.io.tmpdir >> >> > system property. >> >> > >> >> > Regards, >> >> > Keith. >> >> > >> >> > http://keith-chapman.com >> >> > >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma >> >> > wrote: >> >> >> >> >> >> Hi everybody, >> >> >> >> >> >> I am running spark job on yarn, and my problem is that the >> >> >> blockmgr-* >> >> >> folders are being created under >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* >> >> >> The size of this folder can grow to a significant size and does not >> >> >> really fit into /tmp file system for one job, which makes a real >> >> >> problem for my installation. >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >> >> >> location and expected that the block manager will create the files >> >> >> there and not under /tmp, but this is not the case. The files are >> >> >> created under /tmp. >> >> >> >> >> >> I am wondering if there is a way to make spark not use /tmp at all >> >> >> and >> >> >> configure it to create all the files somewhere else ? >> >> >> >> >> >> Any assistance would be greatly appreciated! >> >> >> >> >> >> Best, >> >> >> Michael >> >> >> >> >> >> >> >> >> - >> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >> >> >> > >> > >> > > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Michael, sorry for the late reply. I guess you may have to set it through the hdfs core-site.xml file. The property you need to set is "hadoop.tmp.dir" which defaults to "/tmp/hadoop-${user.name}" Regards, Keith. http://keith-chapman.com On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelmawrote: > Hi Keith, > > Thank you for the idea! > I have tried it, so now the executor command is looking in the following > way : > > /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m > '-Djava.io.tmpdir=my_prefered_path' > -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/ > msh/appcache/application_1521110306769_0041/container_ > 1521110306769_0041_01_04/tmp > > JVM is using the second Djava.io.tmpdir parameter and writing > everything to the same directory as before. > > Best, > Michael > Sincerely, > Michael Shtelma > > > On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman > wrote: > > Can you try setting spark.executor.extraJavaOptions to have > > -Djava.io.tmpdir=someValue > > > > Regards, > > Keith. > > > > http://keith-chapman.com > > > > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma > > wrote: > >> > >> Hi Keith, > >> > >> Thank you for your answer! > >> I have done this, and it is working for spark driver. > >> I would like to make something like this for the executors as well, so > >> that the setting will be used on all the nodes, where I have executors > >> running. > >> > >> Best, > >> Michael > >> > >> > >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman > > >> wrote: > >> > Hi Michael, > >> > > >> > You could either set spark.local.dir through spark conf or > >> > java.io.tmpdir > >> > system property. > >> > > >> > Regards, > >> > Keith. > >> > > >> > http://keith-chapman.com > >> > > >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma > >> > wrote: > >> >> > >> >> Hi everybody, > >> >> > >> >> I am running spark job on yarn, and my problem is that the blockmgr-* > >> >> folders are being created under > >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* > >> >> The size of this folder can grow to a significant size and does not > >> >> really fit into /tmp file system for one job, which makes a real > >> >> problem for my installation. > >> >> I have redefined hadoop.tmp.dir in core-site.xml and > >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other > >> >> location and expected that the block manager will create the files > >> >> there and not under /tmp, but this is not the case. The files are > >> >> created under /tmp. > >> >> > >> >> I am wondering if there is a way to make spark not use /tmp at all > and > >> >> configure it to create all the files somewhere else ? > >> >> > >> >> Any assistance would be greatly appreciated! > >> >> > >> >> Best, > >> >> Michael > >> >> > >> >> > - > >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> >> > >> > > > > > >
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Keith, Thank you for the idea! I have tried it, so now the executor command is looking in the following way : /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m '-Djava.io.tmpdir=my_prefered_path' -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_04/tmp JVM is using the second Djava.io.tmpdir parameter and writing everything to the same directory as before. Best, Michael Sincerely, Michael Shtelma On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapmanwrote: > Can you try setting spark.executor.extraJavaOptions to have > -Djava.io.tmpdir=someValue > > Regards, > Keith. > > http://keith-chapman.com > > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma > wrote: >> >> Hi Keith, >> >> Thank you for your answer! >> I have done this, and it is working for spark driver. >> I would like to make something like this for the executors as well, so >> that the setting will be used on all the nodes, where I have executors >> running. >> >> Best, >> Michael >> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman >> wrote: >> > Hi Michael, >> > >> > You could either set spark.local.dir through spark conf or >> > java.io.tmpdir >> > system property. >> > >> > Regards, >> > Keith. >> > >> > http://keith-chapman.com >> > >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma >> > wrote: >> >> >> >> Hi everybody, >> >> >> >> I am running spark job on yarn, and my problem is that the blockmgr-* >> >> folders are being created under >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* >> >> The size of this folder can grow to a significant size and does not >> >> really fit into /tmp file system for one job, which makes a real >> >> problem for my installation. >> >> I have redefined hadoop.tmp.dir in core-site.xml and >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >> >> location and expected that the block manager will create the files >> >> there and not under /tmp, but this is not the case. The files are >> >> created under /tmp. >> >> >> >> I am wondering if there is a way to make spark not use /tmp at all and >> >> configure it to create all the files somewhere else ? >> >> >> >> Any assistance would be greatly appreciated! >> >> >> >> Best, >> >> Michael >> >> >> >> - >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >> > > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Can you try setting spark.executor.extraJavaOptions to have -D java.io.tmpdir=someValue Regards, Keith. http://keith-chapman.com On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelmawrote: > Hi Keith, > > Thank you for your answer! > I have done this, and it is working for spark driver. > I would like to make something like this for the executors as well, so > that the setting will be used on all the nodes, where I have executors > running. > > Best, > Michael > > > On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman > wrote: > > Hi Michael, > > > > You could either set spark.local.dir through spark conf or java.io.tmpdir > > system property. > > > > Regards, > > Keith. > > > > http://keith-chapman.com > > > > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma > wrote: > >> > >> Hi everybody, > >> > >> I am running spark job on yarn, and my problem is that the blockmgr-* > >> folders are being created under > >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* > >> The size of this folder can grow to a significant size and does not > >> really fit into /tmp file system for one job, which makes a real > >> problem for my installation. > >> I have redefined hadoop.tmp.dir in core-site.xml and > >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other > >> location and expected that the block manager will create the files > >> there and not under /tmp, but this is not the case. The files are > >> created under /tmp. > >> > >> I am wondering if there is a way to make spark not use /tmp at all and > >> configure it to create all the files somewhere else ? > >> > >> Any assistance would be greatly appreciated! > >> > >> Best, > >> Michael > >> > >> - > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > >
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Keith, Thank you for your answer! I have done this, and it is working for spark driver. I would like to make something like this for the executors as well, so that the setting will be used on all the nodes, where I have executors running. Best, Michael On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapmanwrote: > Hi Michael, > > You could either set spark.local.dir through spark conf or java.io.tmpdir > system property. > > Regards, > Keith. > > http://keith-chapman.com > > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma wrote: >> >> Hi everybody, >> >> I am running spark job on yarn, and my problem is that the blockmgr-* >> folders are being created under >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* >> The size of this folder can grow to a significant size and does not >> really fit into /tmp file system for one job, which makes a real >> problem for my installation. >> I have redefined hadoop.tmp.dir in core-site.xml and >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >> location and expected that the block manager will create the files >> there and not under /tmp, but this is not the case. The files are >> created under /tmp. >> >> I am wondering if there is a way to make spark not use /tmp at all and >> configure it to create all the files somewhere else ? >> >> Any assistance would be greatly appreciated! >> >> Best, >> Michael >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder
Hi Michael, You could either set spark.local.dir through spark conf or java.io.tmpdir system property. Regards, Keith. http://keith-chapman.com On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelmawrote: > Hi everybody, > > I am running spark job on yarn, and my problem is that the blockmgr-* > folders are being created under > /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/* > The size of this folder can grow to a significant size and does not > really fit into /tmp file system for one job, which makes a real > problem for my installation. > I have redefined hadoop.tmp.dir in core-site.xml and > yarn.nodemanager.local-dirs in yarn-site.xml pointing to other > location and expected that the block manager will create the files > there and not under /tmp, but this is not the case. The files are > created under /tmp. > > I am wondering if there is a way to make spark not use /tmp at all and > configure it to create all the files somewhere else ? > > Any assistance would be greatly appreciated! > > Best, > Michael > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >