[ https://issues.apache.org/jira/browse/SPARK-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123167#comment-14123167 ]
Kousuke Saruta commented on SPARK-3399: --------------------------------------- Some test for pyspark, for instance rdd.py, use NamedTemporaryFile for creating input data. NamedTemporaryFile creates temporary file on /tmp on local filesystem. rdd.py is kicked by pyspark script in python/run-tests. If we set environment variables HADOOP_CONF_DIR or YARN_CONF_DIR in spark-env.sh before testing, pyspark command load values from those variables. After loading those value, Spark expects input data is on the filesystem configured by the environment variables. > Test for PySpark should ignore HADOOP_CONF_DIR and YARN_CONF_DIR > ---------------------------------------------------------------- > > Key: SPARK-3399 > URL: https://issues.apache.org/jira/browse/SPARK-3399 > Project: Spark > Issue Type: Bug > Components: PySpark > Reporter: Kousuke Saruta > > Some tests for PySpark make temporary files on /tmp of local file system but > if environment variable HADOOP_CONF_DIR or YARN_CONF_DIR is set in > spark-env.sh, tests expects temporary files are on FileSystem configured in > core-site.xml even though actual files are on local file system. > I think, we should ignore HADOOP_CONF_DIR and YARN_CONF_DIR. > If we need those variables in some tests, we should set those variables in > such tests specially. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org