[ https://issues.apache.org/jira/browse/ARROW-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kouhei Sutou updated ARROW-5049: -------------------------------- Fix Version/s: (was: 0.13.0) 0.14.0 > [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow > FileSystem used in spark > ---------------------------------------------------------------------------------------------- > > Key: ARROW-5049 > URL: https://issues.apache.org/jira/browse/ARROW-5049 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0, 0.12.1 > Reporter: Tiger068 > Assignee: Tiger068 > Priority: Major > Fix For: 0.14.0 > > > when i init pyarrow filesystem to connect hdfs clusfter in spark,the libhdfs > throws error: > {code:java} > org/apache/hadoop/fs/FileSystem class not found > {code} > I print out the CLASSPATH, the classpath value is wildcard mode > {code:java} > ../share/hadoop/hdfs;spark/spark-2.0.2-bin-hadoop2.7/jars... > {code} > The value is set by spark,but libhdfs must load class from jar files. > > Root cause is: > In hdfs.py we just check the string ''hadoop" in classpath,but not jar file > {code:java} > def _maybe_set_hadoop_classpath(): > if 'hadoop' in os.environ.get('CLASSPATH', ''): > return{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)