Re: OutOfDirectMemoryError for Spark 2.2
I believe jmap is only showing you the java heap used, but the program is running out of direct memory space. They are two different pools of memory. I haven't had to diagnose a direct memory problem before, but this blog post has some suggestions of how to do it: https://jkutner.github.io/2017/04/28/oh-the-places-your-java-memory-goes.html On Thu, Mar 8, 2018 at 1:57 AM, Chawla,Sumitwrote: > Hi > > Anybody got any pointers on this one? > > Regards > Sumit Chawla > > > On Tue, Mar 6, 2018 at 8:58 AM, Chawla,Sumit > wrote: > >> No, This is the only Stack trace i get. I have tried DEBUG but didn't >> notice much of a log change. >> >> Yes, I have tried bumping MaxDirectMemorySize to get rid of this error. >> It does work if i throw 4G+ memory at it. However, I am trying to >> understand this behavior so that i can setup this number to appropriate >> value. >> >> Regards >> Sumit Chawla >> >> >> On Tue, Mar 6, 2018 at 8:07 AM, Vadim Semenov >> wrote: >> >>> Do you have a trace? i.e. what's the source of `io.netty.*` calls? >>> >>> And have you tried bumping `-XX:MaxDirectMemorySize`? >>> >>> On Tue, Mar 6, 2018 at 12:45 AM, Chawla,Sumit >>> wrote: >>> Hi All I have a job which processes a large dataset. All items in the dataset are unrelated. To save on cluster resources, I process these items in chunks. Since chunks are independent of each other, I start and shut down the spark context for each chunk. This allows me to keep DAG smaller and not retry the entire DAG in case of failures. This mechanism used to work fine with Spark 1.6. Now, as we have moved to 2.2, the job started failing with OutOfDirectMemoryError error. 2018-03-03 22:00:59,687 WARN [rpc-server-48-1] server.TransportChannelHandler (TransportChannelHandler.java:exceptionCaught(78)) - Exception in connection from /10.66.73.27:60374 io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 8388608 byte(s) of direct memory (used: 1023410176, max: 1029177344) at io.netty.util.internal.PlatformDependent.incrementMemoryCoun ter(PlatformDependent.java:506) at io.netty.util.internal.PlatformDependent.allocateDirectNoCle aner(PlatformDependent.java:460) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolAre na.java:701) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:690) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:237) at io.netty.buffer.PoolArena.allocate(PoolArena.java:213) at io.netty.buffer.PoolArena.allocate(PoolArena.java:141) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(Poole dByteBufAllocator.java:271) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra ctByteBufAllocator.java:177) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra ctByteBufAllocator.java:168) at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractBy teBufAllocator.java:129) at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.all ocate(AdaptiveRecvByteBufAllocator.java:104) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re ad(AbstractNioByteChannel.java:117) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven tLoop.java:564) I got some clue on what is causing this from https://github.com/netty/ netty/issues/6343, However I am not able to add up numbers on what is causing 1 GB of Direct Memory to fill up. Output from jmap 7: 22230 1422720 io.netty.buffer.PoolSubpage 12: 1370 804640 io.netty.buffer.PoolSubpage[] 41: 3600 144000 io.netty.buffer.PoolChunkList 98: 1440 46080 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache 113: 300 40800 io.netty.buffer.PoolArena$HeapArena 114: 300 40800 io.netty.buffer.PoolArena$DirectArena 192: 198 15840 io.netty.buffer.PoolChunk 274: 120 8320 io.netty.buffer.PoolThreadCache$MemoryRegionCache[] 406: 120 3840 io.netty.buffer.PoolThreadCache$NormalMemoryRegionCache 422: 72 3552 io.netty.buffer.PoolArena[] 458: 30 2640 io.netty.buffer.PooledUnsafeDirectByteBuf 500: 36 2016 io.netty.buffer.PooledByteBufAllocator 529: 32 1792 io.netty.buffer.UnpooledUnsafeHeapByteBuf 589: 20 1440 io.netty.buffer.PoolThreadCache 630: 37 1184 io.netty.buffer.EmptyByteBuf 703: 36 864 io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache 852: 22 528 io.netty.buffer.AdvancedLeakAwareByteBuf 889: 10 480 io.netty.buffer.SlicedAbstractByteBuf 917: 8 448 io.netty.buffer.UnpooledHeapByteBuf 1018: 20 320
Re: OutOfDirectMemoryError for Spark 2.2
Hi Anybody got any pointers on this one? Regards Sumit Chawla On Tue, Mar 6, 2018 at 8:58 AM, Chawla,Sumitwrote: > No, This is the only Stack trace i get. I have tried DEBUG but didn't > notice much of a log change. > > Yes, I have tried bumping MaxDirectMemorySize to get rid of this error. > It does work if i throw 4G+ memory at it. However, I am trying to > understand this behavior so that i can setup this number to appropriate > value. > > Regards > Sumit Chawla > > > On Tue, Mar 6, 2018 at 8:07 AM, Vadim Semenov wrote: > >> Do you have a trace? i.e. what's the source of `io.netty.*` calls? >> >> And have you tried bumping `-XX:MaxDirectMemorySize`? >> >> On Tue, Mar 6, 2018 at 12:45 AM, Chawla,Sumit >> wrote: >> >>> Hi All >>> >>> I have a job which processes a large dataset. All items in the dataset >>> are unrelated. To save on cluster resources, I process these items in >>> chunks. Since chunks are independent of each other, I start and shut down >>> the spark context for each chunk. This allows me to keep DAG smaller and >>> not retry the entire DAG in case of failures. This mechanism used to work >>> fine with Spark 1.6. Now, as we have moved to 2.2, the job started >>> failing with OutOfDirectMemoryError error. >>> >>> 2018-03-03 22:00:59,687 WARN [rpc-server-48-1] >>> server.TransportChannelHandler >>> (TransportChannelHandler.java:exceptionCaught(78)) >>> - Exception in connection from /10.66.73.27:60374 >>> >>> io.netty.util.internal.OutOfDirectMemoryError: failed to allocate >>> 8388608 byte(s) of direct memory (used: 1023410176, max: 1029177344) >>> >>> at io.netty.util.internal.PlatformDependent.incrementMemoryCoun >>> ter(PlatformDependent.java:506) >>> >>> at io.netty.util.internal.PlatformDependent.allocateDirectNoCle >>> aner(PlatformDependent.java:460) >>> >>> at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolAre >>> na.java:701) >>> >>> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:690) >>> >>> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:237) >>> >>> at io.netty.buffer.PoolArena.allocate(PoolArena.java:213) >>> >>> at io.netty.buffer.PoolArena.allocate(PoolArena.java:141) >>> >>> at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(Poole >>> dByteBufAllocator.java:271) >>> >>> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra >>> ctByteBufAllocator.java:177) >>> >>> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra >>> ctByteBufAllocator.java:168) >>> >>> at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractBy >>> teBufAllocator.java:129) >>> >>> at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.all >>> ocate(AdaptiveRecvByteBufAllocator.java:104) >>> >>> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re >>> ad(AbstractNioByteChannel.java:117) >>> >>> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven >>> tLoop.java:564) >>> >>> I got some clue on what is causing this from https://github.com/netty/ >>> netty/issues/6343, However I am not able to add up numbers on what is >>> causing 1 GB of Direct Memory to fill up. >>> >>> Output from jmap >>> >>> >>> 7: 22230 1422720 io.netty.buffer.PoolSubpage >>> >>> 12: 1370 804640 io.netty.buffer.PoolSubpage[] >>> >>> 41: 3600 144000 io.netty.buffer.PoolChunkList >>> >>> 98: 1440 46080 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache >>> >>> 113: 300 40800 io.netty.buffer.PoolArena$HeapArena >>> >>> 114: 300 40800 io.netty.buffer.PoolArena$DirectArena >>> >>> 192: 198 15840 io.netty.buffer.PoolChunk >>> >>> 274: 120 8320 io.netty.buffer.PoolThreadCache$MemoryRegionCache[] >>> >>> 406: 120 3840 io.netty.buffer.PoolThreadCache$NormalMemoryRegionCache >>> >>> 422: 72 3552 io.netty.buffer.PoolArena[] >>> >>> 458: 30 2640 io.netty.buffer.PooledUnsafeDirectByteBuf >>> >>> 500: 36 2016 io.netty.buffer.PooledByteBufAllocator >>> >>> 529: 32 1792 io.netty.buffer.UnpooledUnsafeHeapByteBuf >>> >>> 589: 20 1440 io.netty.buffer.PoolThreadCache >>> >>> 630: 37 1184 io.netty.buffer.EmptyByteBuf >>> >>> 703: 36 864 io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache >>> >>> 852: 22 528 io.netty.buffer.AdvancedLeakAwareByteBuf >>> >>> 889: 10 480 io.netty.buffer.SlicedAbstractByteBuf >>> >>> 917: 8 448 io.netty.buffer.UnpooledHeapByteBuf >>> >>> 1018: 20 320 io.netty.buffer.PoolThreadCache$1 >>> >>> 1305: 4 128 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry >>> >>> 1404: 1 80 io.netty.buffer.PooledUnsafeHeapByteBuf >>> >>> 1473: 3 72 io.netty.buffer.PoolArena$SizeClass >>> >>> 1529: 1 64 io.netty.buffer.AdvancedLeakAwareCompositeByteBuf >>> >>> 1541: 2 64 io.netty.buffer.CompositeByteBuf$Component >>> >>> 1568: 1 56 io.netty.buffer.CompositeByteBuf >>> >>> 1896: 1 32 io.netty.buffer.PoolArena$SizeClass[] >>> >>> 2042: 1 24 io.netty.buffer.PooledUnsafeDirectByteBuf$1 >>> >>> 2046: 1 24
Re: OutOfDirectMemoryError for Spark 2.2
No, This is the only Stack trace i get. I have tried DEBUG but didn't notice much of a log change. Yes, I have tried bumping MaxDirectMemorySize to get rid of this error. It does work if i throw 4G+ memory at it. However, I am trying to understand this behavior so that i can setup this number to appropriate value. Regards Sumit Chawla On Tue, Mar 6, 2018 at 8:07 AM, Vadim Semenovwrote: > Do you have a trace? i.e. what's the source of `io.netty.*` calls? > > And have you tried bumping `-XX:MaxDirectMemorySize`? > > On Tue, Mar 6, 2018 at 12:45 AM, Chawla,Sumit > wrote: > >> Hi All >> >> I have a job which processes a large dataset. All items in the dataset >> are unrelated. To save on cluster resources, I process these items in >> chunks. Since chunks are independent of each other, I start and shut down >> the spark context for each chunk. This allows me to keep DAG smaller and >> not retry the entire DAG in case of failures. This mechanism used to work >> fine with Spark 1.6. Now, as we have moved to 2.2, the job started >> failing with OutOfDirectMemoryError error. >> >> 2018-03-03 22:00:59,687 WARN [rpc-server-48-1] >> server.TransportChannelHandler >> (TransportChannelHandler.java:exceptionCaught(78)) >> - Exception in connection from /10.66.73.27:60374 >> >> io.netty.util.internal.OutOfDirectMemoryError: failed to allocate >> 8388608 byte(s) of direct memory (used: 1023410176, max: 1029177344) >> >> at io.netty.util.internal.PlatformDependent.incrementMemoryCoun >> ter(PlatformDependent.java:506) >> >> at io.netty.util.internal.PlatformDependent.allocateDirectNoCle >> aner(PlatformDependent.java:460) >> >> at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolAre >> na.java:701) >> >> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:690) >> >> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:237) >> >> at io.netty.buffer.PoolArena.allocate(PoolArena.java:213) >> >> at io.netty.buffer.PoolArena.allocate(PoolArena.java:141) >> >> at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(Poole >> dByteBufAllocator.java:271) >> >> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra >> ctByteBufAllocator.java:177) >> >> at io.netty.buffer.AbstractByteBufAllocator.directBuffer(Abstra >> ctByteBufAllocator.java:168) >> >> at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractBy >> teBufAllocator.java:129) >> >> at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl. >> allocate(AdaptiveRecvByteBufAllocator.java:104) >> >> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe. >> read(AbstractNioByteChannel.java:117) >> >> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven >> tLoop.java:564) >> >> I got some clue on what is causing this from https://github.com/netty/ >> netty/issues/6343, However I am not able to add up numbers on what is >> causing 1 GB of Direct Memory to fill up. >> >> Output from jmap >> >> >> 7: 22230 1422720 io.netty.buffer.PoolSubpage >> >> 12: 1370 804640 io.netty.buffer.PoolSubpage[] >> >> 41: 3600 144000 io.netty.buffer.PoolChunkList >> >> 98: 1440 46080 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache >> >> 113: 300 40800 io.netty.buffer.PoolArena$HeapArena >> >> 114: 300 40800 io.netty.buffer.PoolArena$DirectArena >> >> 192: 198 15840 io.netty.buffer.PoolChunk >> >> 274: 120 8320 io.netty.buffer.PoolThreadCache$MemoryRegionCache[] >> >> 406: 120 3840 io.netty.buffer.PoolThreadCache$NormalMemoryRegionCache >> >> 422: 72 3552 io.netty.buffer.PoolArena[] >> >> 458: 30 2640 io.netty.buffer.PooledUnsafeDirectByteBuf >> >> 500: 36 2016 io.netty.buffer.PooledByteBufAllocator >> >> 529: 32 1792 io.netty.buffer.UnpooledUnsafeHeapByteBuf >> >> 589: 20 1440 io.netty.buffer.PoolThreadCache >> >> 630: 37 1184 io.netty.buffer.EmptyByteBuf >> >> 703: 36 864 io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache >> >> 852: 22 528 io.netty.buffer.AdvancedLeakAwareByteBuf >> >> 889: 10 480 io.netty.buffer.SlicedAbstractByteBuf >> >> 917: 8 448 io.netty.buffer.UnpooledHeapByteBuf >> >> 1018: 20 320 io.netty.buffer.PoolThreadCache$1 >> >> 1305: 4 128 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry >> >> 1404: 1 80 io.netty.buffer.PooledUnsafeHeapByteBuf >> >> 1473: 3 72 io.netty.buffer.PoolArena$SizeClass >> >> 1529: 1 64 io.netty.buffer.AdvancedLeakAwareCompositeByteBuf >> >> 1541: 2 64 io.netty.buffer.CompositeByteBuf$Component >> >> 1568: 1 56 io.netty.buffer.CompositeByteBuf >> >> 1896: 1 32 io.netty.buffer.PoolArena$SizeClass[] >> >> 2042: 1 24 io.netty.buffer.PooledUnsafeDirectByteBuf$1 >> >> 2046: 1 24 io.netty.buffer.UnpooledByteBufAllocator >> >> 2051: 1 24 io.netty.buffer.PoolThreadCache$MemoryRegionCache$1 >> >> 2078: 1 24 io.netty.buffer.PooledHeapByteBuf$1 >> >> 2135: 1 24 io.netty.buffer.PooledUnsafeHeapByteBuf$1 >> >> 2302: 1 16 io.netty.buffer.ByteBufUtil$1 >> >> 2769: 1 16
Re: OutOfDirectMemoryError for Spark 2.2
Do you have a trace? i.e. what's the source of `io.netty.*` calls? And have you tried bumping `-XX:MaxDirectMemorySize`? On Tue, Mar 6, 2018 at 12:45 AM, Chawla,Sumitwrote: > Hi All > > I have a job which processes a large dataset. All items in the dataset > are unrelated. To save on cluster resources, I process these items in > chunks. Since chunks are independent of each other, I start and shut down > the spark context for each chunk. This allows me to keep DAG smaller and > not retry the entire DAG in case of failures. This mechanism used to work > fine with Spark 1.6. Now, as we have moved to 2.2, the job started > failing with OutOfDirectMemoryError error. > > 2018-03-03 22:00:59,687 WARN [rpc-server-48-1] > server.TransportChannelHandler > (TransportChannelHandler.java:exceptionCaught(78)) > - Exception in connection from /10.66.73.27:60374 > > io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 8388608 > byte(s) of direct memory (used: 1023410176, max: 1029177344) > > at io.netty.util.internal.PlatformDependent.incrementMemoryCounter( > PlatformDependent.java:506) > > at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner( > PlatformDependent.java:460) > > at io.netty.buffer.PoolArena$DirectArena.allocateDirect( > PoolArena.java:701) > > at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:690) > > at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:237) > > at io.netty.buffer.PoolArena.allocate(PoolArena.java:213) > > at io.netty.buffer.PoolArena.allocate(PoolArena.java:141) > > at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer( > PooledByteBufAllocator.java:271) > > at io.netty.buffer.AbstractByteBufAllocator.directBuffer( > AbstractByteBufAllocator.java:177) > > at io.netty.buffer.AbstractByteBufAllocator.directBuffer( > AbstractByteBufAllocator.java:168) > > at io.netty.buffer.AbstractByteBufAllocator.ioBuffer( > AbstractByteBufAllocator.java:129) > > at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate( > AdaptiveRecvByteBufAllocator.java:104) > > at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read( > AbstractNioByteChannel.java:117) > > at io.netty.channel.nio.NioEventLoop.processSelectedKey( > NioEventLoop.java:564) > > I got some clue on what is causing this from https://github.com/netty/ > netty/issues/6343, However I am not able to add up numbers on what is > causing 1 GB of Direct Memory to fill up. > > Output from jmap > > > 7: 22230 1422720 io.netty.buffer.PoolSubpage > > 12: 1370 804640 io.netty.buffer.PoolSubpage[] > > 41: 3600 144000 io.netty.buffer.PoolChunkList > > 98: 1440 46080 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache > > 113: 300 40800 io.netty.buffer.PoolArena$HeapArena > > 114: 300 40800 io.netty.buffer.PoolArena$DirectArena > > 192: 198 15840 io.netty.buffer.PoolChunk > > 274: 120 8320 io.netty.buffer.PoolThreadCache$MemoryRegionCache[] > > 406: 120 3840 io.netty.buffer.PoolThreadCache$NormalMemoryRegionCache > > 422: 72 3552 io.netty.buffer.PoolArena[] > > 458: 30 2640 io.netty.buffer.PooledUnsafeDirectByteBuf > > 500: 36 2016 io.netty.buffer.PooledByteBufAllocator > > 529: 32 1792 io.netty.buffer.UnpooledUnsafeHeapByteBuf > > 589: 20 1440 io.netty.buffer.PoolThreadCache > > 630: 37 1184 io.netty.buffer.EmptyByteBuf > > 703: 36 864 io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache > > 852: 22 528 io.netty.buffer.AdvancedLeakAwareByteBuf > > 889: 10 480 io.netty.buffer.SlicedAbstractByteBuf > > 917: 8 448 io.netty.buffer.UnpooledHeapByteBuf > > 1018: 20 320 io.netty.buffer.PoolThreadCache$1 > > 1305: 4 128 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry > > 1404: 1 80 io.netty.buffer.PooledUnsafeHeapByteBuf > > 1473: 3 72 io.netty.buffer.PoolArena$SizeClass > > 1529: 1 64 io.netty.buffer.AdvancedLeakAwareCompositeByteBuf > > 1541: 2 64 io.netty.buffer.CompositeByteBuf$Component > > 1568: 1 56 io.netty.buffer.CompositeByteBuf > > 1896: 1 32 io.netty.buffer.PoolArena$SizeClass[] > > 2042: 1 24 io.netty.buffer.PooledUnsafeDirectByteBuf$1 > > 2046: 1 24 io.netty.buffer.UnpooledByteBufAllocator > > 2051: 1 24 io.netty.buffer.PoolThreadCache$MemoryRegionCache$1 > > 2078: 1 24 io.netty.buffer.PooledHeapByteBuf$1 > > 2135: 1 24 io.netty.buffer.PooledUnsafeHeapByteBuf$1 > > 2302: 1 16 io.netty.buffer.ByteBufUtil$1 > > 2769: 1 16 io.netty.util.internal.__matchers__.io.netty.buffer. > ByteBufMatcher > > > > My Driver machine has 32 CPUs, and as of now i have 15 machines in my > cluster. As of now, the error happens on processing 5th or 6th chunk. I > suspect the error is dependent on number of Executors and would happen > early if we add more executors. > > > I am trying to come up an explanation of what is filling up the Direct > Memory and how to quanitfy it as factor of Number of Executors. Our > cluster is shared cluster, And we need to understand how much Driver > Memory to allocate for most of the jobs. > > > > > >