[
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601010#comment-14601010
]
ramkrishna.s.vasudevan edited comment on HBASE-13970 at 6/25/15 11:11 AM:
--------------------------------------------------------------------------
The reason here is that there are 2 compactions getting triggered from the
CompactionSplitThread on a region that is newly split. One may be due to the
split that happened?
So when both the compaction goes parallely then the
PressureAwareCompactionController is started by the Compactor thread
{code}
@Override
public void start(String compactionName) {
activeCompactions.put(compactionName, new ActiveCompaction());
}
{code}
Now by the time the second compaction is in progress the first compaction
completes and does a finish
{code}
@Override
public void finish(String compactionName) {
ActiveCompaction compaction = activeCompactions.remove(compactionName);
long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() -
compaction.startTime);
.....
{code}
The compactionName is going to be common here because
{code}
String compactionName =
store.getRegionInfo().getRegionNameAsString() + "#" +
store.getFamily().getNameAsString();
{code}
When the second compaction comes for completion the compaction has already been
removed and hence we get an NPE.
Logs
One compaction started with 4 files
{code}
2015-06-25 22:07:49,135 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HRegion: Starting compaction on info in region
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:49,135 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HStore: Starting compaction of 4 file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
totalSize=285.6 M
2015-06-25 22:07:49,165 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
hfile.CacheConfig:
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16,
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
2015-06-25 22:07:49,954 INFO
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled
WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365
with entries=90, filesize=124.14 MB; new WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933
20
{code}
Another compaction has been started with 3 files
{code}
2015-06-25 22:07:53,405 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.HRegion: Starting compaction on info in region
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:53,406 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.HStore: Starting compaction of 3 file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
totalSize=343.4 M
2015-06-25 22:07:53,411 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
hfile.CacheConfig:
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16,
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
2015-06-25 22:07:54,211 INFO [MemStoreFlusher.1] regionserver.HRegion:
Flushing 1/1 column families, memstore=128.23 MB
2015-06-25 22:07:54,639 INFO
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled
WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034
with entries=90, filesize=
{code}
{code}
2015-06-25 22:08:09,446 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
compactions.PressureAwareCompactionThroughputController:
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info
average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is
27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec
2015-06-25 22:08:09,520 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HStore: Completed compaction of 4 (all) file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into c044bd80b8684d97bb8ed2552687d22a(size=228.4 M), total size for store is
686.2 M. This selection was in queue for 0sec, and took 20sec to execute.
2015-06-25 22:08:09,523 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.CompactSplitThread: Completed compaction: Request =
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
storeName=info, fileCount=4, fileSize=285.6 M, priority=6,
time=7539031296069589; duration=20sec
2015-06-25 22:08:09,858 ERROR
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.CompactSplitThread: Compaction failed Request =
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M),
priority=3, time=7539035566509573
java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
at
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 22:12:26,339 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache:
totalSize=1.83 MB, freeSize=1.74 GB, max=1.74 GB, blockCount=0, accesses=0,
hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0,evictions=2
{code}
Both gets completed around the same time and hence you can see that the
compaction with 4 files was successful but the one with 3 got failed with NPE.
Though the compaction was actually done but it got stopped due to NPE. This is
a race condition - anyway the failed compacted files may be get compacated in
the next run of compaction when being selected.
was (Author: ram_krish):
The reason here is that there are 2 compactions getting triggered from the
CompactionSplitThread on a region that is newly split. One may be due to the
split that happened?
So when both the compaction goes parallely then the
PressureAwareCompactionController is started by the Compactor thread
{code}
@Override
public void start(String compactionName) {
activeCompactions.put(compactionName, new ActiveCompaction());
}
{code}
Now by the time the second compaction is in progress the first compaction
completes and does a finish
{code}
@Override
public void finish(String compactionName) {
ActiveCompaction compaction = activeCompactions.remove(compactionName);
long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() -
compaction.startTime);
.....
{code}
The compactionName is going to be common here because
{code}
String compactionName =
store.getRegionInfo().getRegionNameAsString() + "#" +
store.getFamily().getNameAsString();
{code}
When the second compaction comes for completion the compaction has already been
removed and hence we get an NPE.
Logs
One compaction started with 4 files
{code}
2015-06-25 22:07:49,135 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HRegion: Starting compaction on info in region
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:49,135 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HStore: Starting compaction of 4 file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
totalSize=285.6 M
2015-06-25 22:07:49,165 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
hfile.CacheConfig:
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16,
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
2015-06-25 22:07:49,954 INFO
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled
WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365
with entries=90, filesize=124.14 MB; new WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933
20
{code}
Another compaction has been started with 3 files
{code}
2015-06-25 22:07:53,405 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.HRegion: Starting compaction on info in region
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:53,406 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.HStore: Starting compaction of 3 file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
totalSize=343.4 M
2015-06-25 22:07:53,411 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
hfile.CacheConfig:
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16,
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
2015-06-25 22:07:54,211 INFO [MemStoreFlusher.1] regionserver.HRegion:
Flushing 1/1 column families, memstore=128.23 MB
2015-06-25 22:07:54,639 INFO
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled
WAL
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034
with entries=90, filesize=
{code}
{code}
2015-06-25 22:08:09,446 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
compactions.PressureAwareCompactionThroughputController:
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info
average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is
27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec
2015-06-25 22:08:09,520 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.HStore: Completed compaction of 4 (all) file(s) in info of
TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
into c044bd80b8684d97bb8ed2552687d22a(size=228.4 M), total size for store is
686.2 M. This selection was in queue for 0sec, and took 20sec to execute.
2015-06-25 22:08:09,523 INFO
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126]
regionserver.CompactSplitThread: Completed compaction: Request =
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
storeName=info, fileCount=4, fileSize=285.6 M, priority=6,
time=7539031296069589; duration=20sec
2015-06-25 22:08:09,858 ERROR
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998]
regionserver.CompactSplitThread: Compaction failed Request =
regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.,
storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M),
priority=3, time=7539035566509573
java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
at
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 22:12:26,339 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache:
totalSize=1.83 MB, freeSize=1.74 GB, max=1.74 GB, blockCount=0, accesses=0,
hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0,evictions=2
{code}
Both gets completed around the same time and hence you can see that the
compaction with 4 files was successful but the one with 3 got failed with NPE.
Though the compaction was actually done but it got stopped due to NPE. This is
a race condition - so the failed compacted files may be get compacated in the
next run of compaction when being selected.
> NPE during compaction in trunk
> ------------------------------
>
> Key: HBASE-13970
> URL: https://issues.apache.org/jira/browse/HBASE-13970
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
>
> Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure
> all data is flushed out to disk. When the first compaction is triggered we
> get an NPE and this is very easy to reproduce
> {code}
> 015-06-25 21:33:46,041 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received procedure start children changed
> event: /hbase/flush-table-proc/acquired
> 2015-06-25 21:33:46,051 INFO
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1]
> regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
> 2015-06-25 21:33:46,159 ERROR
> [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945]
> regionserver.CompactSplitThread: Compaction failed Request =
> regionName=TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
> storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M),
> priority=3, time=7536968291719985
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
> at
> org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-25 21:33:46,745 INFO
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1]
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M,
> hasBloomFilter=true, into tmp file
> hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
> 2015-06-25 21:33:46,772 INFO
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1]
> regionserver.HStore: Added
> hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
> entries=68116, sequenceid=1534, filesize=68.7 M
> 2015-06-25 21:33:46,773 INFO
> [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1]
> regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344,
> currentsize=0 B/0 for region
> TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
> in 723ms, sequenceid=1534, compaction requested=true
> 2015-06-25 21:33:46,780 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received created
> event:/hbase/flush-table-proc/reached/TestTable
> 2015-06-25 21:33:46,790 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received created
> event:/hbase/flush-table-proc/abort/TestTable
> 2015-06-25 21:33:46,791 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received procedure abort children changed
> event: /hbase/flush-table-proc/abort
> 2015-06-25 21:33:46,803 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received procedure start children changed
> event: /hbase/flush-table-proc/acquired
> 2015-06-25 21:33:46,818 INFO [main-EventThread]
> procedure.ZKProcedureMemberRpcs: Received procedure abort children changed
> event: /hbase/flush-table-proc/abort
> {code}
> Will check this on what is the reason behind it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)