[ https://issues.apache.org/jira/browse/ARROW-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905982#comment-16905982 ]
Jim Northrup commented on ARROW-6202: ------------------------------------- for the record we have pre-tensorflow column counts of about 14000 one-hot attributes. we are seeing numpy RAM requirements of 160 gigs > [Java] Exception in thread "main" > org.apache.arrow.memory.OutOfMemoryException: Unable to allocate buffer of > size 4 due to memory limit. Current allocation: 2147483646 > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-6202 > URL: https://issues.apache.org/jira/browse/ARROW-6202 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Affects Versions: 0.14.1 > Reporter: Jim Northrup > Priority: Major > Labels: jdbc > > jdbc query results exceed native heap when using generous -Xmx settings. > for roughly 800 megabytes of csv/flatfile resultset, arrow is unable to house > the contents in RAM long enough to persist to disk, without explicit > knowledge beyond unit test sample code. > source: > https://github.com/jnorthrup/jdbc2json/blob/master/src/main/java/com/fnreport/QueryToFeather.kt#L83 > {code:java} > Exception in thread "main" org.apache.arrow.memory.OutOfMemoryException: > Unable to allocate buffer of size 4 due to memory limit. Current allocation: > 2147483646 > at > org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:307) > at > org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:277) > at > org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.updateVector(JdbcToArrowUtils.java:610) > at > org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToFieldVector(JdbcToArrowUtils.java:462) > at > org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToArrowVectors(JdbcToArrowUtils.java:396) > at > org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:225) > at > org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:187) > at > org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow(JdbcToArrow.java:156) > at com.fnreport.QueryToFeather$Companion.go(QueryToFeather.kt:83) > at > com.fnreport.QueryToFeather$Companion$main$1.invokeSuspend(QueryToFeather.kt:95) > at > kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) > at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241) > at > kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:270) > at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:79) > at > kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:54) > at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source) > at > kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:36) > at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source) > at com.fnreport.QueryToFeather$Companion.main(QueryToFeather.kt:93) > at com.fnreport.QueryToFeather.main(QueryToFeather.kt) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)