[jira] [Resolved] (ARROW-12983) [C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk

Krisztian Szucs (Jira) Tue, 22 Jun 2021 08:48:07 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krisztian Szucs resolved ARROW-12983.
-------------------------------------
    Fix Version/s: 5.0.0
       Resolution: Fixed

Issue resolved by pull request 10556
[https://github.com/apache/arrow/pull/10556]

> [C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if 
> values don't fit in single chunk
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12983
>                 URL: https://issues.apache.org/jira/browse/ARROW-12983
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 4.0.0, 4.0.1
>            Reporter: Laurent Mazare
>            Assignee: David Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> _Apologies if this is a duplicate, I haven't found anything related_
> When creating an arrow table via the python api, the following code runs out 
> of memory after using all the available resources on a box with 512GB of ram. 
> This happens with pyarrow 4.0.0 and 4.0.1. However when running the same code 
> with pyarrow 3.0.0, the memory usage only reaches 5GB (which seems like the 
> appropriate ballpark for the table size).
>  The code generates a table with a single string column with 1m rows, each 
> string being 3000 characters long.
> Not sure whether the issue is python related or not, I haven't tried 
> replicating it from the C++ api.
>  
> {code:python}
> import os, string
> import numpy as np
> import pyarrow as pa
> print(pa.__version__)
> np.random.seed(42)
> alphabet = list(string.ascii_uppercase)
> _col = []
> for _n in range(1000):
>   k = ''.join(np.random.choice(alphabet, 3000))
>   _col += [k] * 1000
> table = pa.Table.from_pydict({'col': _col})
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12983) [C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk

Reply via email to