cshannon commented on issue #4157:
URL: https://github.com/apache/accumulo/issues/4157#issuecomment-1889632647
I started looking into this and I grabbed a heap dump and looked at it. The
JVM was set to 256 megs, the majority of the Heap seems to be byte buffers and
it seems like the majority of the data are Hadoop byte buffers from
decompressing and scanning. There is 185 megs of unreachable objects and I'm
100% sure but I think this would indicate that there is not a memory leak but
that the GC process simply couldn't keep up fast enough.
#### Heap Dump Info
```
Property |File |Baseline
-----------------------------------------------------------------
Statistic information | |
|- Heap |61,937,696 |
|- Number of Objects |286,159 |
|- Number of Classes |7,779 |
|- Number of Class Loaders |28 |
|- Number of GCRoots |2,660 |
|- Unreachable (discarded) Heap |185,335,736|
'- Number of Unreachable (discarded) Objects|71,324 |
-----------------------------------------------------------------
```
```
Class Name
| Shallow Heap | Retained Heap | Percentage
------------------------------------------------------------------------------------------------------------------------
java.util.zip.ZipFile$Source @ 0xf0ea4bf0
| 80 | 2,894,544 | 4.67%
java.util.zip.ZipFile$Source @ 0xf0da1e38
| 80 | 2,802,800 | 4.53%
org.apache.accumulo.server.fs.FileManager @ 0xf10c8270
| 56 | 2,194,248 | 3.54%
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xffe56328 JNI Global
| 96 | 1,275,512 | 2.06%
java.util.zip.ZipFile$Source @ 0xf0b65060
| 80 | 1,218,088 | 1.97%
java.util.zip.ZipFile$Source @ 0xf06858b8
| 80 | 680,656 | 1.10%
java.util.zip.ZipFile$Source @ 0xf0ac7220
| 80 | 467,640 | 0.76%
java.util.zip.ZipFile$Source @ 0xf0685600
| 80 | 437,272 | 0.71%
java.util.zip.ZipFile$Source @ 0xf0b67be0
| 80 | 305,208 | 0.49%
org.apache.accumulo.core.file.blockfile.cache.lru.LruBlockCache @
0xf15777e0| 64 | 301,912 | 0.49%
------------------------------------------------------------------------------------------------------------------------
```
Here is part of the stack trace for the Tablet server and the full one below
that:
#### Partial Stack Trace
```
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:64)
at
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:71)
at
org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:92)
at
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:170)
at
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:153)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader$RBlockState.<init>(BCFile.java:485)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.createReader(BCFile.java:742)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.getDataBlock(BCFile.java:728)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getDataBlock(CachableBlockFile.java:459)
...
```
#### Full Stack Trace
<details>
<summary>Full Stack Trace</summary>
```
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:64)
at
org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:71)
at
org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:92)
at
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:170)
at
org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm.createDecompressionStream(CompressionAlgorithm.java:153)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader$RBlockState.<init>(BCFile.java:485)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.createReader(BCFile.java:742)
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.getDataBlock(BCFile.java:728)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getDataBlock(CachableBlockFile.java:459)
at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.getDataBlock(RFile.java:899)
at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:1050)
at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:922)
at
org.apache.accumulo.core.iteratorsImpl.system.LocalityGroupIterator.seek(LocalityGroupIterator.java:269)
at
org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1479)
at
org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:105)
at
org.apache.accumulo.core.iteratorsImpl.system.MultiIterator.seek(MultiIterator.java:108)
at
org.apache.accumulo.core.iteratorsImpl.system.StatsIterator.seek(StatsIterator.java:69)
at
org.apache.accumulo.core.iteratorsImpl.system.DeletingIterator.seek(DeletingIterator.java:76)
at
org.apache.accumulo.core.iterators.ServerSkippingIterator.seek(ServerSkippingIterator.java:54)
at
org.apache.accumulo.core.iteratorsImpl.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:130)
at
org.apache.accumulo.core.iterators.ServerFilter.seek(ServerFilter.java:58)
at
org.apache.accumulo.core.iterators.SynchronizedServerFilter.seek(SynchronizedServerFilter.java:58)
at
org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:165)
at
org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:237)
at
org.apache.accumulo.tserver.tablet.TabletBase.nextBatch(TabletBase.java:279)
at org.apache.accumulo.tserver.tablet.Scanner.read(Scanner.java:120)
at
org.apache.accumulo.tserver.scan.NextBatchTask.run(NextBatchTask.java:78)
at
org.apache.accumulo.tserver.session.ScanSession$ScanMeasurer.run(ScanSession.java:62)
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
Error thrown in thread: Thread[scan-default-Worker-14,5,main], halting VM.
```
</details>
#### JUnit Output
This occurs after ingesting/verifying 100 megabytes of data and running the
concurrent splits:
```
2024-01-12T11:29:31,918 41 [clientImpl.ThriftTransportPool] DEBUG: Set
thrift transport pool idle time to 3000ms
2024-01-12T11:29:32,280 47 [functional.SplitIT] DEBUG: Creating table
SplitIT_concurrentSplit0
2024-01-12T11:29:32,291 47 [zookeeper.ZooSession] DEBUG: Connecting to
localhost:44231 with timeout 30000 with auth
2024-01-12T11:29:32,402 47 [clientImpl.ThriftTransportPool] DEBUG: Set
thrift transport pool idle time to 3000ms
2024-01-12T11:29:34,626 47 [functional.SplitIT] DEBUG: Ingesting 100000 rows
into SplitIT_concurrentSplit0
2024-01-12T11:29:34,829 47 [logging.InternalLoggerFactory] DEBUG: Using
SLF4J as the default logging framework
2024-01-12T11:29:45,339 56 [clientImpl.ClientTabletCacheImpl] DEBUG:
Requesting hosting for 1 ondemand tablets for table id 1.
100,000 records written | 7,246 records/sec | 102,900,000 bytes
written | 7,456,521 bytes/sec | 13.800 secs
2024-01-12T11:29:48,693 47 [functional.SplitIT] DEBUG: Verifying 100000 rows
ingested into SplitIT_concurrentSplit0
100,000 records read | 173,913 records/sec | 102,900,000 bytes read |
178,956,521 bytes/sec | 0.575 secs
2024-01-12T11:29:49,279 47 [functional.SplitIT] DEBUG: Creating futures that
add random splits to the table
2024-01-12T11:29:49,284 47 [functional.SplitIT] DEBUG: Submitting futures
2024-01-12T11:29:49,290 47 [functional.SplitIT] DEBUG: Waiting for futures
to complete
```
#### Other Info:
I checked the WAL directory and there is about 100 MB used. The tables
directory only had about 1.3 MB used. So far it seems like this just might be
too much data being loaded into memory for the test and the GC process just
can't keep up but if it could it would clean up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]