[ 
https://issues.apache.org/jira/browse/ARROW-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097841#comment-16097841
 ] 

Wes McKinney commented on ARROW-1247:
-------------------------------------

Is it possible to link to a pickle file of the DataFrame causing the problem so 
we can take a look (if the data is not sensitive)? 

As one other thing I'd like to rule out, can you remove the pyarrow installed 
with pip and instead install with conda to see if that also fails?

{{conda install pyarrow=0.4.1 -c conda-forge}}

Thanks

> pyarrow causes python to crash errors on parquet.dll
> ----------------------------------------------------
>
>                 Key: ARROW-1247
>                 URL: https://issues.apache.org/jira/browse/ARROW-1247
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.4.1
>         Environment: Python Version:
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 
> 64 bit (AMD64)]
> Windows Edition: Windows Server 2012 R2
>            Reporter: Aditi Breed
>
> Hello,
>       I have a script which fetches data, and stores the data in Pandas 
> dataframe.
> I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to 
> an arrow table and saved on the disk as a parquet file.
> This code works just fine for 100-500 records, but errors out for bigger 
> volume. I also know this code works because another developer is using the 
> same code on a mirrored machine ( in terms of hardware ) and it works.
> The order of the dataset I am trying to save is millions.
> The code errors out @ line    pq.write_table(arrowTable, filePath).
> Here is the code:
>     arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
>       
>       begintime = datetime.now()
>       begintime_str = begintime.strftime("%Y%m%d%I%M%S")              
>       
>       filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + 
> ".parq"
>       print('Begin Saving File')
>       pq.write_table(arrowTable, filePath)
>       print('Done Saving File')
>       
>       print('Appending FilePath to List')
>       self.listspDF.append(filePath)
>       print('Done Appending FilePath to List')
>       
> Python crashes and throws a "python has to close error".
> Following is the detailed error:
> ------------------
> Problem Event Name:                        APPCRASH
>   Application Name:                           python.exe
>   Application Version:                        3.5.2150.1013
>   Application Timestamp:                  577be340
>   Fault Module Name:                        parquet.dll
>   Fault Module Version:                     0.0.0.0
>   Fault Module Timestamp:               59403662
>   Exception Code:                               c0000005
>   Exception Offset:                              000000000005f990
>   OS Version:                                       6.3.9600.2.0.0.400.8
>   Locale ID:                                          1033
> Read our privacy statement online:
>   http://go.microsoft.com/fwlink/?linkid=280262
> If the online privacy statement is not available, please read our privacy 
> statement offline:
>   C:\Windows\system32\en-US\erofflps.txt
> --------------------------------------------
> I have tried updating Python and pyarrow, with no luck.
> Following is the version of python:
>     import sys
>     print (sys.version)
>     3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC 
> v.1900 64 bit (AMD64)]
> Following are results of pip freeze:
>       alabaster==0.7.9
>       anaconda-clean==1.0
>       anaconda-client==1.5.1
>       anaconda-navigator==1.3.1
>       argcomplete==1.0.0
>       astroid==1.4.7
>       astropy==2.0
>       Babel==2.3.4
>       backports.shutil-get-terminal-size==1.0.0
>       beautifulsoup4==4.5.1
>       bitarray==0.8.1
>       blaze==0.10.1
>       bokeh==0.12.2
>       boto==2.42.0
>       Bottleneck==1.2.1
>       cffi==1.7.0
>       chest==0.2.3
>       click==6.6
>       cloudpickle==0.2.1
>       clyent==1.2.2
>       colorama==0.3.7
>       comtypes==1.1.2
>       conda==4.3.22
>       conda-build==2.0.2
>       configobj==5.0.6
>       contextlib2==0.5.3
>       cryptography==1.5
>       cycler==0.10.0
>       Cython==0.24.1
>       cytoolz==0.8.0
>       dask==0.11.0
>       datashape==0.5.2
>       decorator==4.0.10
>       dill==0.2.5
>       docutils==0.12
>       dynd===c328ab7
>       et-xmlfile==1.0.1
>       fastcache==1.0.2
>       filelock==2.0.6
>       Flask==0.11.1
>       Flask-Cors==2.1.2
>       gevent==1.1.2
>       greenlet==0.4.10
>       h5py==2.7.0
>       HeapDict==1.0.0
>       idna==2.1
>       imageio==2.2.0
>       imagesize==0.7.1
>       ipykernel==4.5.0
>       ipython==5.1.0
>       ipython-genutils==0.1.0
>       ipywidgets==5.2.2
>       itsdangerous==0.24
>       jdcal==1.2
>       jedi==0.9.0
>       Jinja2==2.8
>       jsonschema==2.5.1
>       jupyter==1.0.0
>       jupyter-client==4.4.0
>       jupyter-console==5.0.0
>       jupyter-core==4.2.0
>       lazy-object-proxy==1.2.1
>       llvmlite==0.19.0
>       locket==0.2.0
>       lxml==3.6.4
>       MarkupSafe==0.23
>       matplotlib==2.0.2
>       menuinst==1.4.1
>       mistune==0.7.3
>       mpmath==0.19
>       multipledispatch==0.4.8
>       nb-anacondacloud==1.2.0
>       nb-conda==2.0.0
>       nb-conda-kernels==2.0.0
>       nbconvert==4.2.0
>       nbformat==4.1.0
>       nbpresent==3.0.2
>       networkx==1.11
>       nltk==3.2.1
>       nose==1.3.7
>       notebook==4.2.3
>       numba==0.34.0
>       numexpr==2.6.2
>       numpy==1.13.1
>       odo==0.5.0
>       openpyxl==2.3.2
>       pandas==0.20.2
>       partd==0.3.6
>       path.py==0.0.0
>       pathlib2==2.1.0
>       patsy==0.4.1
>       pep8==1.7.0
>       pickleshare==0.7.4
>       Pillow==3.3.1
>       pkginfo==1.3.2
>       ply==3.9
>       prompt-toolkit==1.0.3
>       psutil==4.3.1
>       py==1.4.31
>       py4j==0.10.4
>       pyarrow==0.4.1
>       pyasn1==0.1.9
>       pycosat==0.6.1
>       pycparser==2.14
>       pycrypto==2.6.1
>       pycurl==7.43.0
>       pyflakes==1.3.0
>       Pygments==2.1.3
>       pyidealdata==0.7.0
>       pylint==1.5.4
>       pyodbc==4.0.17
>       pyOpenSSL==16.2.0
>       pyparsing==2.1.4
>       pyspark==2.1.0+hadoop2.7
>       pytest==2.9.2
>       python-dateutil==2.5.3
>       pytz==2016.6.1
>       PyUber==1.4.4
>       PyWavelets==0.5.2
>       pywin32==220
>       PyYAML==3.12
>       pyzmq==15.4.0
>       QtAwesome==0.3.3
>       qtconsole==4.2.1
>       QtPy==1.1.2
>       requests==2.14.2
>       rope-py3k==0.9.4.post1
>       ruamel-yaml===-VERSION
>       scikit-image==0.13.0
>       scikit-learn==0.18.2
>       scipy==0.19.1
>       simplegeneric==0.8.1
>       singledispatch==3.4.0.3
>       six==1.10.0
>       snowballstemmer==1.2.1
>       sockjs-tornado==1.0.3
>       sphinx==1.4.6
>       spyder==3.0.0
>       SQLAlchemy==1.0.13
>       statsmodels==0.8.0
>       sympy==1.0
>       tables==3.2.2
>       toolz==0.8.0
>       tornado==4.4.1
>       traitlets==4.3.0
>       unicodecsv==0.14.1
>       wcwidth==0.1.7
>       Werkzeug==0.11.11
>       widgetsnbextension==1.2.6
>       win-unicode-console==0.5
>       wrapt==1.10.6
>       xlrd==1.0.0
>       XlsxWriter==0.9.3
>       xlwings==0.10.0
>       xlwt==1.1.2
> I was wondering if someone could shed light why pyarrow would not work on a 
> certain machine ?
> Thanks,
> Adu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to