Re: [I] Investigate OOM error in SplitIT#concurrentSplit() test after changes in #4113 [accumulo]

via GitHub Fri, 12 Jan 2024 08:48:25 -0800


cshannon commented on issue #4157:
URL: https://github.com/apache/accumulo/issues/4157#issuecomment-1889632647


   I started looking into this and I grabbed a heap dump and looked at it. The 
JVM was set to 256 megs, the majority of the Heap seems to be byte buffers and 
it seems like the majority of the data are Hadoop byte buffers from 
decompressing and scanning. There is 185 megs of unreachable objects and I'm 
100% sure but I think this would indicate that there is not a memory leak but 
that the GC process simply couldn't keep up fast enough.
   
   #### Heap Dump Info
   
   ```
   Property                                    |File       |Baseline
   -----------------------------------------------------------------
   Statistic information                       |           |
   |- Heap                                     |61,937,696 |
   |- Number of Objects                        |286,159    |
   |- Number of Classes                        |7,779      |
   |- Number of Class Loaders                  |28         |
   |- Number of GCRoots                        |2,660      |
   |- Unreachable (discarded) Heap             |185,335,736|
   '- Number of Unreachable (discarded) Objects|71,324     |
   -----------------------------------------------------------------
   ```
   
   ```
   Class Name                                                                  
| Shallow Heap | Retained Heap | Percentage
   
------------------------------------------------------------------------------------------------------------------------
   java.util.zip.ZipFile$Source @ 0xf0ea4bf0                                   
|           80 |     2,894,544 |      4.67%
   java.util.zip.ZipFile$Source @ 0xf0da1e38                                   
|           80 |     2,802,800 |      4.53%
   org.apache.accumulo.server.fs.FileManager @ 0xf10c8270                      
|           56 |     2,194,248 |      3.54%
   jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xffe56328 JNI Global     
|           96 |     1,275,512 |      2.06%
   java.util.zip.ZipFile$Source @ 0xf0b65060                                   
|           80 |     1,218,088 |      1.97%
   java.util.zip.ZipFile$Source @ 0xf06858b8                                   
|           80 |       680,656 |      1.10%
   java.util.zip.ZipFile$Source @ 0xf0ac7220                                   
|           80 |       467,640 |      0.76%
   java.util.zip.ZipFile$Source @ 0xf0685600                                   
|           80 |       437,272 |      0.71%
   java.util.zip.ZipFile$Source @ 0xf0b67be0                                   
|           80 |       305,208 |      0.49%
   org.apache.accumulo.core.file.blockfile.cache.lru.LruBlockCache @ 
0xf15777e0|           64 |       301,912 |      0.49%
   
------------------------------------------------------------------------------------------------------------------------
   ```
   
   
   Here is part of the stack trace for the Tablet server and the full one below 
that:
   
   #### Partial Stack Trace
   ```
   java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:64)
        at 
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:71)
        at 
org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:92)
        at 
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:170)
        at 
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:153)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader$RBlockState.<init>(BCFile.java:485)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.createReader(BCFile.java:742)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.getDataBlock(BCFile.java:728)
        at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getDataBlock(CachableBlockFile.java:459)
   ...
   ```
   #### Full Stack Trace
   <details>
   <summary>Full Stack Trace</summary>
   
   ```
   java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:64)
        at 
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:71)
        at 
org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:92)
        at 
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:170)
        at 
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:153)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader$RBlockState.<init>(BCFile.java:485)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.createReader(BCFile.java:742)
        at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.getDataBlock(BCFile.java:728)
        at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getDataBlock(CachableBlockFile.java:459)
        at 
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.getDataBlock(RFile.java:899)
        at 
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:1050)
        at 
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:922)
        at 
org.apache.accumulo.core.iteratorsImpl.system.LocalityGroupIterator.seek(LocalityGroupIterator.java:269)
        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1479)
        at 
org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:105)
        at 
org.apache.accumulo.core.iteratorsImpl.system.MultiIterator.seek(MultiIterator.java:108)
        at 
org.apache.accumulo.core.iteratorsImpl.system.StatsIterator.seek(StatsIterator.java:69)
        at 
org.apache.accumulo.core.iteratorsImpl.system.DeletingIterator.seek(DeletingIterator.java:76)
        at 
org.apache.accumulo.core.iterators.ServerSkippingIterator.seek(ServerSkippingIterator.java:54)
        at 
org.apache.accumulo.core.iteratorsImpl.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:130)
        at 
org.apache.accumulo.core.iterators.ServerFilter.seek(ServerFilter.java:58)
        at 
org.apache.accumulo.core.iterators.SynchronizedServerFilter.seek(SynchronizedServerFilter.java:58)
        at 
org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:165)
        at 
org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:237)
        at 
org.apache.accumulo.tserver.tablet.TabletBase.nextBatch(TabletBase.java:279)
        at org.apache.accumulo.tserver.tablet.Scanner.read(Scanner.java:120)
        at 
org.apache.accumulo.tserver.scan.NextBatchTask.run(NextBatchTask.java:78)
        at 
org.apache.accumulo.tserver.session.ScanSession$ScanMeasurer.run(ScanSession.java:62)
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
   Error thrown in thread: Thread[scan-default-Worker-14,5,main], halting VM.
   ```
   </details>
   
   #### JUnit Output
   
   This occurs after ingesting/verifying 100 megabytes of data and running the 
concurrent splits:
   
   
   ```
   2024-01-12T11:29:31,918 41 [clientImpl.ThriftTransportPool] DEBUG: Set 
thrift transport pool idle time to 3000ms
   2024-01-12T11:29:32,280 47 [functional.SplitIT] DEBUG: Creating table 
SplitIT_concurrentSplit0
   2024-01-12T11:29:32,291 47 [zookeeper.ZooSession] DEBUG: Connecting to 
localhost:44231 with timeout 30000 with auth
   2024-01-12T11:29:32,402 47 [clientImpl.ThriftTransportPool] DEBUG: Set 
thrift transport pool idle time to 3000ms
   2024-01-12T11:29:34,626 47 [functional.SplitIT] DEBUG: Ingesting 100000 rows 
into SplitIT_concurrentSplit0
   2024-01-12T11:29:34,829 47 [logging.InternalLoggerFactory] DEBUG: Using 
SLF4J as the default logging framework
   2024-01-12T11:29:45,339 56 [clientImpl.ClientTabletCacheImpl] DEBUG: 
Requesting hosting for 1 ondemand tablets for table id 1.
        100,000 records written |    7,246 records/sec |  102,900,000 bytes 
written | 7,456,521 bytes/sec | 13.800 secs   
   2024-01-12T11:29:48,693 47 [functional.SplitIT] DEBUG: Verifying 100000 rows 
ingested into SplitIT_concurrentSplit0
        100,000 records read |  173,913 records/sec |  102,900,000 bytes read | 
178,956,521 bytes/sec |  0.575 secs   
   2024-01-12T11:29:49,279 47 [functional.SplitIT] DEBUG: Creating futures that 
add random splits to the table
   2024-01-12T11:29:49,284 47 [functional.SplitIT] DEBUG: Submitting futures
   2024-01-12T11:29:49,290 47 [functional.SplitIT] DEBUG: Waiting for futures 
to complete
   ```
   
   #### Other Info:
   
   I checked the WAL directory and there is about 100 MB used. The tables 
directory only had about 1.3 MB used. So far it seems like this just might be 
too much data being loaded into memory for the test and the GC process just 
can't keep up but if it could it would clean up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Investigate OOM error in SplitIT#concurrentSplit() test after changes in #4113 [accumulo]

Reply via email to