[ 
https://issues.apache.org/jira/browse/ARROW-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6791.
-------------------------------
    Resolution: Duplicate

Dup of ARROW-6060. Upgrade to 0.15.0 (just published to conda-forge/PyPI)

> Memory Leak 
> ------------
>
>                 Key: ARROW-6791
>                 URL: https://issues.apache.org/jira/browse/ARROW-6791
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0, 0.14.1
>         Environment: Ubuntu 18.04, 32GB ram, conda-forge installation
>            Reporter: George Prichard
>            Priority: Major
>
> Memory leak with large string columns crashes the program. This only seems to 
> affect 0.14.x  - it works fine for me in 0.13.0. It might be related to 
> earlier similar issues? e.g. [https://github.com/apache/arrow/issues/2624]
> Below is a reprex which works in earlier versions, but crashes on read 
> (writing is fine) in this one. The real-life version of the data is full of 
> URLs as the strings. 
> Weirdly it crashes my 32GB Ubuntu 18.04, but runs (if very slowly for the 
> read) on my 16GB Macbook. 
> Thanks so much for the excellent tools! 
>  
>  
> {code:java}
> import pandas as pd
> n_rows = int(1e6)
> n_cols = 10
> col_length = 100
> df = pd.DataFrame()
> for i in range(n_cols):
>     df[f'col_{i}'] = pd.util.testing.rands_array(col_length, n_rows)
> print('Generated df', df.shape)
> filename = 'tmp.parquet'
> print('Writing parquet')
> df.to_parquet(filename)
> print('Reading parquet')
> pd.read_parquet(filename)
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to