Aditi Breed created ARROW-1247:
----------------------------------

             Summary: pyarrow causes python to crash errors on parquet.dll
                 Key: ARROW-1247
                 URL: https://issues.apache.org/jira/browse/ARROW-1247
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.4.1
         Environment: Python Version:
3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 
64 bit (AMD64)]


Windows Edition: Windows Server 2012 R2
            Reporter: Aditi Breed


Hello,
      I have a script which fetches data, and stores the data in Pandas 
dataframe.

I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to 
an arrow table and saved on the disk as a parquet file.

This code works just fine for 100-500 records, but errors out for bigger 
volume. I also know this code works because another developer is using the same 
code on a mirrored machine ( in terms of hardware ) and it works.
The order of the dataset I am trying to save is millions.

The code errors out @ line      pq.write_table(arrowTable, filePath).



Here is the code:

    arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
        
        begintime = datetime.now()
        begintime_str = begintime.strftime("%Y%m%d%I%M%S")              
        
        filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + 
".parq"

        print('Begin Saving File')
        pq.write_table(arrowTable, filePath)
        print('Done Saving File')
        
        print('Appending FilePath to List')
        self.listspDF.append(filePath)
        print('Done Appending FilePath to List')
        



Python crashes and throws a "python has to close error".


Following is the detailed error:
------------------
Problem Event Name:                        APPCRASH
  Application Name:                           python.exe
  Application Version:                        3.5.2150.1013
  Application Timestamp:                  577be340
  Fault Module Name:                        parquet.dll
  Fault Module Version:                     0.0.0.0
  Fault Module Timestamp:               59403662
  Exception Code:                               c0000005
  Exception Offset:                              000000000005f990
  OS Version:                                       6.3.9600.2.0.0.400.8
  Locale ID:                                          1033

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=280262

If the online privacy statement is not available, please read our privacy 
statement offline:
  C:\Windows\system32\en-US\erofflps.txt

--------------------------------------------

I have tried updating Python and pyarrow, with no luck.

Following is the version of python:

    import sys

    print (sys.version)
    3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC 
v.1900 64 bit (AMD64)]

Following are results of pip freeze:

        alabaster==0.7.9
        anaconda-clean==1.0
        anaconda-client==1.5.1
        anaconda-navigator==1.3.1
        argcomplete==1.0.0
        astroid==1.4.7
        astropy==2.0
        Babel==2.3.4
        backports.shutil-get-terminal-size==1.0.0
        beautifulsoup4==4.5.1
        bitarray==0.8.1
        blaze==0.10.1
        bokeh==0.12.2
        boto==2.42.0
        Bottleneck==1.2.1
        cffi==1.7.0
        chest==0.2.3
        click==6.6
        cloudpickle==0.2.1
        clyent==1.2.2
        colorama==0.3.7
        comtypes==1.1.2
        conda==4.3.22
        conda-build==2.0.2
        configobj==5.0.6
        contextlib2==0.5.3
        cryptography==1.5
        cycler==0.10.0
        Cython==0.24.1
        cytoolz==0.8.0
        dask==0.11.0
        datashape==0.5.2
        decorator==4.0.10
        dill==0.2.5
        docutils==0.12
        dynd===c328ab7
        et-xmlfile==1.0.1
        fastcache==1.0.2
        filelock==2.0.6
        Flask==0.11.1
        Flask-Cors==2.1.2
        gevent==1.1.2
        greenlet==0.4.10
        h5py==2.7.0
        HeapDict==1.0.0
        idna==2.1
        imageio==2.2.0
        imagesize==0.7.1
        ipykernel==4.5.0
        ipython==5.1.0
        ipython-genutils==0.1.0
        ipywidgets==5.2.2
        itsdangerous==0.24
        jdcal==1.2
        jedi==0.9.0
        Jinja2==2.8
        jsonschema==2.5.1
        jupyter==1.0.0
        jupyter-client==4.4.0
        jupyter-console==5.0.0
        jupyter-core==4.2.0
        lazy-object-proxy==1.2.1
        llvmlite==0.19.0
        locket==0.2.0
        lxml==3.6.4
        MarkupSafe==0.23
        matplotlib==2.0.2
        menuinst==1.4.1
        mistune==0.7.3
        mpmath==0.19
        multipledispatch==0.4.8
        nb-anacondacloud==1.2.0
        nb-conda==2.0.0
        nb-conda-kernels==2.0.0
        nbconvert==4.2.0
        nbformat==4.1.0
        nbpresent==3.0.2
        networkx==1.11
        nltk==3.2.1
        nose==1.3.7
        notebook==4.2.3
        numba==0.34.0
        numexpr==2.6.2
        numpy==1.13.1
        odo==0.5.0
        openpyxl==2.3.2
        pandas==0.20.2
        partd==0.3.6
        path.py==0.0.0
        pathlib2==2.1.0
        patsy==0.4.1
        pep8==1.7.0
        pickleshare==0.7.4
        Pillow==3.3.1
        pkginfo==1.3.2
        ply==3.9
        prompt-toolkit==1.0.3
        psutil==4.3.1
        py==1.4.31
        py4j==0.10.4
        pyarrow==0.4.1
        pyasn1==0.1.9
        pycosat==0.6.1
        pycparser==2.14
        pycrypto==2.6.1
        pycurl==7.43.0
        pyflakes==1.3.0
        Pygments==2.1.3
        pyidealdata==0.7.0
        pylint==1.5.4
        pyodbc==4.0.17
        pyOpenSSL==16.2.0
        pyparsing==2.1.4
        pyspark==2.1.0+hadoop2.7
        pytest==2.9.2
        python-dateutil==2.5.3
        pytz==2016.6.1
        PyUber==1.4.4
        PyWavelets==0.5.2
        pywin32==220
        PyYAML==3.12
        pyzmq==15.4.0
        QtAwesome==0.3.3
        qtconsole==4.2.1
        QtPy==1.1.2
        requests==2.14.2
        rope-py3k==0.9.4.post1
        ruamel-yaml===-VERSION
        scikit-image==0.13.0
        scikit-learn==0.18.2
        scipy==0.19.1
        simplegeneric==0.8.1
        singledispatch==3.4.0.3
        six==1.10.0
        snowballstemmer==1.2.1
        sockjs-tornado==1.0.3
        sphinx==1.4.6
        spyder==3.0.0
        SQLAlchemy==1.0.13
        statsmodels==0.8.0
        sympy==1.0
        tables==3.2.2
        toolz==0.8.0
        tornado==4.4.1
        traitlets==4.3.0
        unicodecsv==0.14.1
        wcwidth==0.1.7
        Werkzeug==0.11.11
        widgetsnbextension==1.2.6
        win-unicode-console==0.5
        wrapt==1.10.6
        xlrd==1.0.0
        XlsxWriter==0.9.3
        xlwings==0.10.0
        xlwt==1.1.2

I was wondering if someone could shed light why pyarrow would not work on a 
certain machine ?

Thanks,
Adu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to