[
https://issues.apache.org/jira/browse/SPARK-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566754#comment-14566754
]
Josh Rosen commented on SPARK-4099:
-----------------------------------
If this is still an issue, can you submit a pull request and include
documentation on how to reproduce this problem?
> env var HOME not set correctly
> ------------------------------
>
> Key: SPARK-4099
> URL: https://issues.apache.org/jira/browse/SPARK-4099
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.1.0
> Reporter: Radim Rehurek
> Priority: Minor
>
> The HOME environment var is not set properly, in PySpark jobs. For example,
> when setting up a Spark cluster on AWS, `os.environ["HOME"]` gives "/home",
> rather than the correct "/home/hadoop".
> One consequence is that some Python packages don't work (including NLTK).
> This is because they rely on HOME to work properly, as they store some
> internal data there.
> I assume this problem is to do with the way Spark launches the job processes
> (no shell).
> Fix is simple: users have to manually set `os.environ["HOME"]`, before
> importing said packages.
> But it's pretty non-intuitive and maybe hard to figure out for some users. I
> think it's better to set HOME directly on Spark side. This will make NLTK
> (and others) work out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]