[
https://issues.apache.org/jira/browse/HBASE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027398#comment-14027398
]
Esteban Gutierrez commented on HBASE-11325:
-------------------------------------------
This is how the RS aborted due this corrupt entry in the memstore:
{code}
14/06/05 18:41:44 FATAL regionserver.HRegionServer: ABORTING region server
172.16.0.101,60020,1402018185865: Unrecoverable exception while closing region
t0,,1402015274138.a9b83f7801ce96574aeeb2be048690b8., still finishing close
org.apache.hadoop.hbase.DroppedSnapshotException: region:
t0,,1402015274138.a9b83f7801ce96574aeeb2be048690b8.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1606)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1480)
at
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1009)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:957)
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into
a column actually smaller than the previous column:
at
org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
at
org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:357)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:365)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:812)
at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746)
at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1581)
{code}
If the malformed RPC Put didn't crash the RS, sometimes it was possible to end
with a corrupt HFile:
{code}
4/06/05 19:24:06 ERROR compactions.CompactionRequest: Compaction failed
regionName=t0,,1402020343626.25a1ee35a486a512b5b3c18e1c56ba39., storeName=f,
fileCount=10, fileSize=6.8k (875.0, 678.0, 678.0, 678.0, 678.0, 712.0, 678.0,
678.0, 678.0, 678.0), priority=-7, time=1402021446164920000
java.lang.ArrayIndexOutOfBoundsException: 274
at
org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:251)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:365)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
at
org.apache.hadoop.hbase.regionserver.Compactor.compact(Compactor.java:184)
at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:1081)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1336)
at
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:303)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.ut14/06/05 19:24:06 DEBUG master.AssignmentManager: The znode
of region t0,,1402020343626.25a1ee35a486a512b5b3c18e1c56ba39. has been deleted.
il.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{code}
Inspecting the file was not possible after some point:
{code}
K: 2\x01fc\x00\x00\x01F\x86\xE1\xC5\xC9/two\x00:/1402422281673/4/vlen=3/ts=0 V:
two
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 264
at org.apache.hadoop.hbase.util.Bytes.toStringBinary(Bytes.java:387)
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:775)
at org.apache.hadoop.hbase.KeyValue.toString(KeyValue.java:731)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:269)
at
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:229)
at
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:189)
at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:750)
{code}
> Malformed RPC calls can corrupt stores
> --------------------------------------
>
> Key: HBASE-11325
> URL: https://issues.apache.org/jira/browse/HBASE-11325
> Project: HBase
> Issue Type: Bug
> Components: Client, regionserver
> Affects Versions: 0.94.20
> Reporter: Esteban Gutierrez
>
> We noticed in a cluster a Region Server that aborted with a
> DroppedSnapshotException due an IOException in ScanWildcardColumnTracker when
> the RS tried to flush the memstore. After further research it was found that
> a client was sending corrupt RPCs requests to the RS and those corrupt
> requests ended into the stores causing corruption of the memstore itself and
> in some cases HFiles. More details to follow.
--
This message was sent by Atlassian JIRA
(v6.2#6252)