[
https://issues.apache.org/jira/browse/ARROW-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114616#comment-16114616
]
Uwe L. Korn commented on ARROW-1282:
------------------------------------
Notes while digging on this issue:
* I can reproduce this in the environment in
https://github.com/xhochy/arrow-dockerfiles/tree/master/ARROW-1282 using the
NYC Taxi Tripdata CSV from January 2016 using only VendorID and all columns
that would be {{object}} on using plain {{pd.read_csv}}
* Using the script {{hanger.py}} in there, it always hangs on the third
iteration.
* Building {{jemalloc==4.5.0}} using {{--with-malloc-conf=dss:disabled}} avoids
the issue.
> Large memory reallocation by Arrow causes hang in jemalloc
> ----------------------------------------------------------
>
> Key: ARROW-1282
> URL: https://issues.apache.org/jira/browse/ARROW-1282
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Jeff Knupp
> Fix For: 0.7.0
>
>
> When reallocating a large amount of memory, Arrow is either triggering a bug
> in jemalloc or has a bug itself in the memory manager (many different
> applications reporting same issue but not clear from jemalloc issue
> description if they're sure it's in jemalloc or caused by other issues like
> using multiple memory allocation libraries in the same process, multithreaded
> access, etc).
> Link to stack trace is here:
> https://gist.github.com/jeffknupp/73879feacf9c560afd4f1a20213dc6ef
> Link to issue in jemalloc GitHub is here:
> https://github.com/jemalloc/jemalloc/issues/802
> Originally observed in redis, discussed with jemalloc maintainer here:
> https://github.com/antirez/redis/issues/3799
> *This is entirely reproducible on Ubuntu 16.04 xenial, which uses version
> 3.6.0 according to `apt` metadata.*
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)