[ https://issues.apache.org/jira/browse/HBASE-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Li updated HBASE-16032: -------------------------- Description: We observed frequent fullGC of RS in our production environment, and after analyzing the heapdump, we found large memory occupancy by HStore#changedReaderObservers, the map is surprisingly containing 7500w objects... After some debugging, I located some possible memory leak in StoreScanner constructor: {code} public StoreScanner(Store store, ScanInfo scanInfo, Scan scan, final NavigableSet<byte[]> columns, long readPt) throws IOException { this(store, scan, scanInfo, columns, readPt, scan.getCacheBlocks()); if (columns != null && scan.isRaw()) { throw new DoNotRetryIOException("Cannot specify any column for a raw scan"); } matcher = new ScanQueryMatcher(scan, scanInfo, columns, ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP, oldestUnexpiredTS, now, store.getCoprocessorHost()); this.store.addChangedReaderObserver(this); // Pass columns to try to filter out unnecessary StoreFiles. List<KeyValueScanner> scanners = getScannersNoCompaction(); ... seekScanners(scanners, matcher.getStartKey(), explicitColumnQuery && lazySeekEnabledGlobally, parallelSeekEnabled); ... resetKVHeap(scanners, store.getComparator()); } {code} If there's any Exception thrown after {{this.store.addChangedReaderObserver(this)}}, the returned scanner might be null and there's no chance to remove the scanner from changedReaderObservers, like in {{HRegion#get}} {code} RegionScanner scanner = null; try { scanner = getScanner(scan); scanner.next(results); } finally { if (scanner != null) scanner.close(); } {code} What's more, all exception thrown in the {{HRegion#getScanner}} path will cause scanner==null then memory leak, so we also need to handle this part. was: We observed frequent fullGC of RS in our production environment, and after analyzing the heapdump, we found large memory occupancy by HStore#changedReaderObservers, the map is surprisingly containing 7500w objects... After some debugging, I located some possible memory leak in StoreScanner constructor: {code} public StoreScanner(Store store, ScanInfo scanInfo, Scan scan, final NavigableSet<byte[]> columns, long readPt) throws IOException { this(store, scan, scanInfo, columns, readPt, scan.getCacheBlocks()); if (columns != null && scan.isRaw()) { throw new DoNotRetryIOException("Cannot specify any column for a raw scan"); } matcher = new ScanQueryMatcher(scan, scanInfo, columns, ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP, oldestUnexpiredTS, now, store.getCoprocessorHost()); this.store.addChangedReaderObserver(this); // Pass columns to try to filter out unnecessary StoreFiles. List<KeyValueScanner> scanners = getScannersNoCompaction(); ... seekScanners(scanners, matcher.getStartKey(), explicitColumnQuery && lazySeekEnabledGlobally, parallelSeekEnabled); ... resetKVHeap(scanners, store.getComparator()); } {code} If there's any Exception thrown after {{this.store.addChangedReaderObserver(this)}}, the returned scanner might be null and there's no chance to remove the scanner from changedReaderObservers, like in HRegion#get {code} RegionScanner scanner = null; try { scanner = getScanner(scan); scanner.next(results); } finally { if (scanner != null) scanner.close(); } {code} > Possible memory leak in StoreScanner > ------------------------------------ > > Key: HBASE-16032 > URL: https://issues.apache.org/jira/browse/HBASE-16032 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.1, 1.1.5, 0.98.20 > Reporter: Yu Li > Assignee: Yu Li > Fix For: 2.0.0, 1.3.0, 1.2.2, 1.1.6, 0.98.21 > > Attachments: HBASE-16032.patch, HBASE-16032_v2.patch > > > We observed frequent fullGC of RS in our production environment, and after > analyzing the heapdump, we found large memory occupancy by > HStore#changedReaderObservers, the map is surprisingly containing 7500w > objects... > After some debugging, I located some possible memory leak in StoreScanner > constructor: > {code} > public StoreScanner(Store store, ScanInfo scanInfo, Scan scan, final > NavigableSet<byte[]> columns, > long readPt) > throws IOException { > this(store, scan, scanInfo, columns, readPt, scan.getCacheBlocks()); > if (columns != null && scan.isRaw()) { > throw new DoNotRetryIOException("Cannot specify any column for a raw > scan"); > } > matcher = new ScanQueryMatcher(scan, scanInfo, columns, > ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP, > oldestUnexpiredTS, now, store.getCoprocessorHost()); > this.store.addChangedReaderObserver(this); > // Pass columns to try to filter out unnecessary StoreFiles. > List<KeyValueScanner> scanners = getScannersNoCompaction(); > ... > seekScanners(scanners, matcher.getStartKey(), explicitColumnQuery > && lazySeekEnabledGlobally, parallelSeekEnabled); > ... > resetKVHeap(scanners, store.getComparator()); > } > {code} > If there's any Exception thrown after > {{this.store.addChangedReaderObserver(this)}}, the returned scanner might be > null and there's no chance to remove the scanner from changedReaderObservers, > like in {{HRegion#get}} > {code} > RegionScanner scanner = null; > try { > scanner = getScanner(scan); > scanner.next(results); > } finally { > if (scanner != null) > scanner.close(); > } > {code} > What's more, all exception thrown in the {{HRegion#getScanner}} path will > cause scanner==null then memory leak, so we also need to handle this part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)