Aditi Breed created ARROW-1247: ---------------------------------- Summary: pyarrow causes python to crash errors on parquet.dll Key: ARROW-1247 URL: https://issues.apache.org/jira/browse/ARROW-1247 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.4.1 Environment: Python Version: 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
Windows Edition: Windows Server 2012 R2 Reporter: Aditi Breed Hello, I have a script which fetches data, and stores the data in Pandas dataframe. I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to an arrow table and saved on the disk as a parquet file. This code works just fine for 100-500 records, but errors out for bigger volume. I also know this code works because another developer is using the same code on a mirrored machine ( in terms of hardware ) and it works. The order of the dataset I am trying to save is millions. The code errors out @ line pq.write_table(arrowTable, filePath). Here is the code: arrowTable = pa.Table.from_pandas(self.grpByMeanDS2) begintime = datetime.now() begintime_str = begintime.strftime("%Y%m%d%I%M%S") filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + ".parq" print('Begin Saving File') pq.write_table(arrowTable, filePath) print('Done Saving File') print('Appending FilePath to List') self.listspDF.append(filePath) print('Done Appending FilePath to List') Python crashes and throws a "python has to close error". Following is the detailed error: ------------------ Problem Event Name: APPCRASH Application Name: python.exe Application Version: 3.5.2150.1013 Application Timestamp: 577be340 Fault Module Name: parquet.dll Fault Module Version: 0.0.0.0 Fault Module Timestamp: 59403662 Exception Code: c0000005 Exception Offset: 000000000005f990 OS Version: 6.3.9600.2.0.0.400.8 Locale ID: 1033 Read our privacy statement online: http://go.microsoft.com/fwlink/?linkid=280262 If the online privacy statement is not available, please read our privacy statement offline: C:\Windows\system32\en-US\erofflps.txt -------------------------------------------- I have tried updating Python and pyarrow, with no luck. Following is the version of python: import sys print (sys.version) 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] Following are results of pip freeze: alabaster==0.7.9 anaconda-clean==1.0 anaconda-client==1.5.1 anaconda-navigator==1.3.1 argcomplete==1.0.0 astroid==1.4.7 astropy==2.0 Babel==2.3.4 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.5.1 bitarray==0.8.1 blaze==0.10.1 bokeh==0.12.2 boto==2.42.0 Bottleneck==1.2.1 cffi==1.7.0 chest==0.2.3 click==6.6 cloudpickle==0.2.1 clyent==1.2.2 colorama==0.3.7 comtypes==1.1.2 conda==4.3.22 conda-build==2.0.2 configobj==5.0.6 contextlib2==0.5.3 cryptography==1.5 cycler==0.10.0 Cython==0.24.1 cytoolz==0.8.0 dask==0.11.0 datashape==0.5.2 decorator==4.0.10 dill==0.2.5 docutils==0.12 dynd===c328ab7 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==2.0.6 Flask==0.11.1 Flask-Cors==2.1.2 gevent==1.1.2 greenlet==0.4.10 h5py==2.7.0 HeapDict==1.0.0 idna==2.1 imageio==2.2.0 imagesize==0.7.1 ipykernel==4.5.0 ipython==5.1.0 ipython-genutils==0.1.0 ipywidgets==5.2.2 itsdangerous==0.24 jdcal==1.2 jedi==0.9.0 Jinja2==2.8 jsonschema==2.5.1 jupyter==1.0.0 jupyter-client==4.4.0 jupyter-console==5.0.0 jupyter-core==4.2.0 lazy-object-proxy==1.2.1 llvmlite==0.19.0 locket==0.2.0 lxml==3.6.4 MarkupSafe==0.23 matplotlib==2.0.2 menuinst==1.4.1 mistune==0.7.3 mpmath==0.19 multipledispatch==0.4.8 nb-anacondacloud==1.2.0 nb-conda==2.0.0 nb-conda-kernels==2.0.0 nbconvert==4.2.0 nbformat==4.1.0 nbpresent==3.0.2 networkx==1.11 nltk==3.2.1 nose==1.3.7 notebook==4.2.3 numba==0.34.0 numexpr==2.6.2 numpy==1.13.1 odo==0.5.0 openpyxl==2.3.2 pandas==0.20.2 partd==0.3.6 path.py==0.0.0 pathlib2==2.1.0 patsy==0.4.1 pep8==1.7.0 pickleshare==0.7.4 Pillow==3.3.1 pkginfo==1.3.2 ply==3.9 prompt-toolkit==1.0.3 psutil==4.3.1 py==1.4.31 py4j==0.10.4 pyarrow==0.4.1 pyasn1==0.1.9 pycosat==0.6.1 pycparser==2.14 pycrypto==2.6.1 pycurl==7.43.0 pyflakes==1.3.0 Pygments==2.1.3 pyidealdata==0.7.0 pylint==1.5.4 pyodbc==4.0.17 pyOpenSSL==16.2.0 pyparsing==2.1.4 pyspark==2.1.0+hadoop2.7 pytest==2.9.2 python-dateutil==2.5.3 pytz==2016.6.1 PyUber==1.4.4 PyWavelets==0.5.2 pywin32==220 PyYAML==3.12 pyzmq==15.4.0 QtAwesome==0.3.3 qtconsole==4.2.1 QtPy==1.1.2 requests==2.14.2 rope-py3k==0.9.4.post1 ruamel-yaml===-VERSION scikit-image==0.13.0 scikit-learn==0.18.2 scipy==0.19.1 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.10.0 snowballstemmer==1.2.1 sockjs-tornado==1.0.3 sphinx==1.4.6 spyder==3.0.0 SQLAlchemy==1.0.13 statsmodels==0.8.0 sympy==1.0 tables==3.2.2 toolz==0.8.0 tornado==4.4.1 traitlets==4.3.0 unicodecsv==0.14.1 wcwidth==0.1.7 Werkzeug==0.11.11 widgetsnbextension==1.2.6 win-unicode-console==0.5 wrapt==1.10.6 xlrd==1.0.0 XlsxWriter==0.9.3 xlwings==0.10.0 xlwt==1.1.2 I was wondering if someone could shed light why pyarrow would not work on a certain machine ? Thanks, Adu -- This message was sent by Atlassian JIRA (v6.4.14#64029)