[ 
https://issues.apache.org/jira/browse/SPARK-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566754#comment-14566754
 ] 

Josh Rosen commented on SPARK-4099:
-----------------------------------

If this is still an issue, can you submit a pull request and include 
documentation on how to reproduce this problem?

> env var HOME not set correctly
> ------------------------------
>
>                 Key: SPARK-4099
>                 URL: https://issues.apache.org/jira/browse/SPARK-4099
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.1.0
>            Reporter: Radim Rehurek
>            Priority: Minor
>
> The HOME environment var is not set properly, in PySpark jobs. For example, 
> when setting up a Spark cluster on AWS, `os.environ["HOME"]` gives "/home", 
> rather than the correct "/home/hadoop".
> One consequence is that some Python packages don't work (including NLTK). 
> This is because they rely on HOME to work properly, as they store some 
> internal data there.
> I assume this problem is to do with the way Spark launches the job processes 
> (no shell).
> Fix is simple: users have to manually set `os.environ["HOME"]`, before 
> importing said packages.
> But it's pretty non-intuitive and maybe hard to figure out for some users. I 
> think it's better to set HOME directly on Spark side. This will make NLTK 
> (and others) work out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to