[
https://issues.apache.org/jira/browse/HBASE-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345794#comment-15345794
]
Yu Li commented on HBASE-16032:
-------------------------------
Observed such an exception caused by HDFS issue again from online system, FYI:
{noformat}
2016-06-23 05:37:49,006 WARN
[B.defaultRpcServer.handler=50,queue=11,port=16020] hdfs.BlockReaderFactory:
I/O error constructing remote block reader.
java.io.IOException: Got error for OP_READ_BLOCK, self=/11.251.154.150:48658,
remote=/11.251.154.150:50010, for file
/hbase/data/default/xxx/f2906ffec8da5f7766c9d70583fa2a49/ack/81335fbb90424c279a01a1239da6cfdc,
for pool BP-246621818-11.251.158.244-1465218756497 block 1311577203_237866590
at
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:817)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
at
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1204)
at
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1150)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1482)
at
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1403)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1614)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1493)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:449)
at
org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:100)
at
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesGeneralBloomFilter(StoreFile.java:1319)
at
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesBloomFilter(StoreFile.java:1184)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:413)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:400)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:319)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:193)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2018)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5448)
at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2560)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2546)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2528)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6643)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6622)
{noformat}
> Possible memory leak in StoreScanner
> ------------------------------------
>
> Key: HBASE-16032
> URL: https://issues.apache.org/jira/browse/HBASE-16032
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.1, 1.1.5, 0.98.20
> Reporter: Yu Li
> Assignee: Yu Li
> Fix For: 2.0.0, 1.2.2, 1.1.6, 1.3.1, 0.98.21
>
> Attachments: HBASE-16032.patch, HBASE-16032_v2.patch,
> HBASE-16032_v3.patch, HBASE-16032_v4.patch
>
>
> We observed frequent fullGC of RS in our production environment, and after
> analyzing the heapdump, we found large memory occupancy by
> HStore#changedReaderObservers, the map is surprisingly containing 7500w
> objects...
> After some debugging, I located some possible memory leak in StoreScanner
> constructor:
> {code}
> public StoreScanner(Store store, ScanInfo scanInfo, Scan scan, final
> NavigableSet<byte[]> columns,
> long readPt)
> throws IOException {
> this(store, scan, scanInfo, columns, readPt, scan.getCacheBlocks());
> if (columns != null && scan.isRaw()) {
> throw new DoNotRetryIOException("Cannot specify any column for a raw
> scan");
> }
> matcher = new ScanQueryMatcher(scan, scanInfo, columns,
> ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP,
> oldestUnexpiredTS, now, store.getCoprocessorHost());
> this.store.addChangedReaderObserver(this);
> // Pass columns to try to filter out unnecessary StoreFiles.
> List<KeyValueScanner> scanners = getScannersNoCompaction();
> ...
> seekScanners(scanners, matcher.getStartKey(), explicitColumnQuery
> && lazySeekEnabledGlobally, parallelSeekEnabled);
> ...
> resetKVHeap(scanners, store.getComparator());
> }
> {code}
> If there's any Exception thrown after
> {{this.store.addChangedReaderObserver(this)}}, the returned scanner might be
> null and there's no chance to remove the scanner from changedReaderObservers,
> like in {{HRegion#get}}
> {code}
> RegionScanner scanner = null;
> try {
> scanner = getScanner(scan);
> scanner.next(results);
> } finally {
> if (scanner != null)
> scanner.close();
> }
> {code}
> What's more, all exception thrown in the {{HRegion#getScanner}} path will
> cause scanner==null then memory leak, so we also need to handle this part.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)