[
https://issues.apache.org/jira/browse/ARROW-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney closed ARROW-6791.
-------------------------------
Resolution: Duplicate
Dup of ARROW-6060. Upgrade to 0.15.0 (just published to conda-forge/PyPI)
> Memory Leak
> ------------
>
> Key: ARROW-6791
> URL: https://issues.apache.org/jira/browse/ARROW-6791
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.0, 0.14.1
> Environment: Ubuntu 18.04, 32GB ram, conda-forge installation
> Reporter: George Prichard
> Priority: Major
>
> Memory leak with large string columns crashes the program. This only seems to
> affect 0.14.x - it works fine for me in 0.13.0. It might be related to
> earlier similar issues? e.g. [https://github.com/apache/arrow/issues/2624]
> Below is a reprex which works in earlier versions, but crashes on read
> (writing is fine) in this one. The real-life version of the data is full of
> URLs as the strings.
> Weirdly it crashes my 32GB Ubuntu 18.04, but runs (if very slowly for the
> read) on my 16GB Macbook.
> Thanks so much for the excellent tools!
>
>
> {code:java}
> import pandas as pd
> n_rows = int(1e6)
> n_cols = 10
> col_length = 100
> df = pd.DataFrame()
> for i in range(n_cols):
> df[f'col_{i}'] = pd.util.testing.rands_array(col_length, n_rows)
> print('Generated df', df.shape)
> filename = 'tmp.parquet'
> print('Writing parquet')
> df.to_parquet(filename)
> print('Reading parquet')
> pd.read_parquet(filename)
> {code}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)