[ 
https://issues.apache.org/jira/browse/ARROW-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906332#comment-16906332
 ] 

Micah Kornfield commented on ARROW-6202:
----------------------------------------

Thanks for the update.  The code on master might still have memory issues, 
because it tries to load all records in one contiguous batch but with the 
memory leak fixed it might allow you to move forward.  I filed 
https://issues.apache.org/jira/browse/ARROW-6219 to track improving the API so 
result sets can be batched up.

 

"for the record we have pre-tensorflow column counts of about 14000 one-hot 
attributes. we are seeing numpy RAM requirements of 160 gigs"

The current java implementation is purely off-heap and the current allocator 
support Integer.MAX_INT bytes. Technically, a separate allocator per column 
could be used but I don't think that is a great solution here.

There is a discussion on the mailing list about supporting 64-bit address 
spaces 
([https://lists.apache.org/thread.html/dcd1ae4d943b40568e6b178b91cb7ed012168711f9eb2bbf3c3cbd2d@%3Cdev.arrow.apache.org%3E]).

14000 columns might stretch the arrow format a little bit.  There was previous 
discussions on the mailing list about tensor modeling 
([https://lists.apache.org/thread.html/8e48930c39bf9b70f28f9a83d1b7e9e71fa45d85d711e25dedc598f9@%3Cdev.arrow.apache.org%3E])
 that might be worth looking at.

 

> [Java] Exception in thread "main" 
> org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of 
> size 4 due to memory limit. Current allocation: 2147483646
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6202
>                 URL: https://issues.apache.org/jira/browse/ARROW-6202
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.14.1
>            Reporter: Jim Northrup
>            Priority: Major
>              Labels: jdbc
>
> jdbc query results exceed native heap when using generous -Xmx settings. 
> for roughly 800 megabytes of csv/flatfile resultset, arrow is unable to house 
> the contents in RAM long enough to persist to disk, without explicit 
> knowledge beyond unit test sample code.
> source:
> https://github.com/jnorthrup/jdbc2json/blob/master/src/main/java/com/fnreport/QueryToFeather.kt#L83
> {code:java}
> Exception in thread "main" org.apache.arrow.memory.OutOfMemoryException: 
> Unable to allocate buffer of size 4 due to memory limit. Current allocation: 
> 2147483646
>         at 
> org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:307)
>         at 
> org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:277)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.updateVector(JdbcToArrowUtils.java:610)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToFieldVector(JdbcToArrowUtils.java:462)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToArrowVectors(JdbcToArrowUtils.java:396)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:225)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:187)
>         at 
> org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:156)
>         at com.fnreport.QueryToFeather$Companion.go(QueryToFeather.kt:83)
>         at 
> com.fnreport.QueryToFeather$Companion$main$1.invokeSuspend(QueryToFeather.kt:95)
>         at 
> kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
>         at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
>         at 
> kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:270)
>         at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:79)
>         at 
> kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:54)
>         at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
>         at 
> kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:36)
>         at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
>         at com.fnreport.QueryToFeather$Companion.main(QueryToFeather.kt:93)
>         at com.fnreport.QueryToFeather.main(QueryToFeather.kt)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to