[ 
https://issues.apache.org/jira/browse/ARROW-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114616#comment-16114616
 ] 

Uwe L. Korn commented on ARROW-1282:
------------------------------------

Notes while digging on this issue:

* I can reproduce this in the environment in 
https://github.com/xhochy/arrow-dockerfiles/tree/master/ARROW-1282 using the 
NYC Taxi Tripdata CSV from January 2016 using only VendorID and all columns 
that would be {{object}} on using plain {{pd.read_csv}}
* Using the script {{hanger.py}} in there, it always hangs on the third 
iteration.
* Building {{jemalloc==4.5.0}} using {{--with-malloc-conf=dss:disabled}} avoids 
the issue.

> Large memory reallocation by Arrow causes hang in jemalloc
> ----------------------------------------------------------
>
>                 Key: ARROW-1282
>                 URL: https://issues.apache.org/jira/browse/ARROW-1282
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Jeff Knupp
>             Fix For: 0.7.0
>
>
> When reallocating a large amount of memory, Arrow is either triggering a bug 
> in jemalloc or has a bug itself in the memory manager (many different 
> applications reporting same issue but not clear from jemalloc issue 
> description if they're sure it's in jemalloc or caused by other issues like 
> using multiple memory allocation libraries in the same process, multithreaded 
> access, etc).
> Link to stack trace is here: 
> https://gist.github.com/jeffknupp/73879feacf9c560afd4f1a20213dc6ef
> Link to issue in jemalloc GitHub is here: 
> https://github.com/jemalloc/jemalloc/issues/802
> Originally observed in redis, discussed with jemalloc maintainer here: 
> https://github.com/antirez/redis/issues/3799
> *This is entirely reproducible on Ubuntu 16.04 xenial, which uses version 
> 3.6.0 according to `apt` metadata.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to