James Porritt created ARROW-1445: ------------------------------------ Summary: Python: Segfault when using libhdfs3 in pyarrow using latest API Key: ARROW-1445 URL: https://issues.apache.org/jira/browse/ARROW-1445 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.6.0 Reporter: James Porritt
I'm encoutering a segfault when using libhdfs3 with pyarrow. My script is: {code} import pyarrow def main(): hdfs = pyarrow.hdfs.connect("<host>", <port>, "<username>", driver='libhdfs') print hdfs.ls('<my path>') hdfs3a = pyarrow.HdfsClient("<host>", <port>, "<username>", driver='libhdfs3') print hdfs3a.ls('<my path>') hdfs3b = pyarrow.hdfs.connect("<host>", <port>, "<username>", driver='libhdfs3') print hdfs3b.ls('<my path>') main() {code} The first two hdfs connections yield the correct list. The third yields: {noformat} # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f69c0c8b57f, pid=88070, tid=140092200666880 # # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x13357f] __strlen_sse42+0xf {noformat} It dumps an error report file too. I created my conda environment with: {noformat} conda create -n parquet source activate parquet conda install pyarrow libhdfs3 -c conda-forge {noformat} The packages used are: {noformat} arrow-cpp 0.6.0 np113py27_1 conda-forge boost-cpp 1.64.0 1 conda-forge bzip2 1.0.6 1 conda-forge ca-certificates 2017.7.27.1 0 conda-forge certifi 2017.7.27.1 py27_0 conda-forge curl 7.54.1 0 conda-forge icu 58.1 1 conda-forge krb5 1.14.2 0 conda-forge libgcrypt 1.8.0 0 conda-forge libgpg-error 1.27 0 conda-forge libgsasl 1.8.0 1 conda-forge libhdfs3 2.3 0 conda-forge libiconv 1.14 4 conda-forge libntlm 1.4 0 conda-forge libssh2 1.8.0 1 conda-forge libuuid 1.0.3 1 conda-forge libxml2 2.9.4 4 conda-forge mkl 2017.0.3 0 ncurses 5.9 10 conda-forge numpy 1.13.1 py27_0 openssl 1.0.2l 0 conda-forge pandas 0.20.3 py27_1 conda-forge parquet-cpp 1.3.0.pre 1 conda-forge pip 9.0.1 py27_0 conda-forge protobuf 3.3.2 py27_0 conda-forge pyarrow 0.6.0 np113py27_1 conda-forge python 2.7.13 1 conda-forge python-dateutil 2.6.1 py27_0 conda-forge pytz 2017.2 py27_0 conda-forge readline 6.2 0 conda-forge setuptools 36.2.2 py27_0 conda-forge six 1.10.0 py27_1 conda-forge sqlite 3.13.0 1 conda-forge tk 8.5.19 2 conda-forge wheel 0.29.0 py27_0 conda-forge xz 5.2.3 0 conda-forge zlib 1.2.11 0 conda-forge {noformat} I've set my ARROW_LIBHDFS_DIR to point at the location of the libhdfs3.so file. I've populated my CLASSPATH as per the documentation. Please advise. -- This message was sent by Atlassian JIRA (v6.4.14#64029)