[
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706821#comment-14706821
]
Michael Rose commented on HBASE-13329:
--------------------------------------
In 1.x.x at least, this same change needs to be applied in both KeyValue.java
(13329-v1.patch does this change) as well as
CellComparator#getMinimumMidpointArray.
Ran into this issue with HBase 1.0.0 (CDH5.4.0). It's unclear what originally
caused the issue (after weeks of stable operation), but it would cause RSes to
abort. At this point, no RS was able to open the region until I applied the
patch from this issue as well as making the same change in CellComparator.
With original patch applied even:
slave3.xxx.xxx.xxx,60020,1440131603772: Replay of WAL required. Forcing server
shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region:
deduplication,P\xDFt\x10\x053e73ceff5a2717d2ba76887ea21a2a8e353d1372\xFE,1438362391124.2bb6a602be6b1bfcea0508af4ba42235.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2243)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1972)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1935)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1833)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:452)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:413)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:70)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:229)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at
org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:494)
at
org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:949)
Was this an accidental omission? If so, should I open a new issue for this?
> ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
> ------------------------------------------------------------------------
>
> Key: HBASE-13329
> URL: https://issues.apache.org/jira/browse/HBASE-13329
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.0.1
> Environment: linux-debian-jessie
> ec2 - t2.micro instances
> Reporter: Ruben Aguiar
> Assignee: Lars Hofhansl
> Priority: Critical
> Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2
>
> Attachments: 13329-asserts.patch, 13329-v1.patch, 13329.txt,
> HBASE-13329.test.00.branch-1.1.patch
>
>
> While trying to benchmark my opentsdb cluster, I've created a script that
> sends to hbase always the same value (in this case 1). After a few minutes,
> the whole region server crashes and the region itself becomes impossible to
> open again (cannot assign or unassign). After some investigation, what I saw
> on the logs is that when a Memstore flush is called on a large region (128mb)
> the process errors, killing the regionserver. On restart, replaying the edits
> generates the same error, making the region unavailable. Tried to manually
> unassign, assign or close_region. That didn't work because the code that
> reads/replays it crashes.
> From my investigation this seems to be an overflow issue. The logs show that
> the function getMinimumMidpointArray tried to access index -32743 of an
> array, extremely close to the minimum short value in Java. Upon investigation
> of the source code, it seems an index short is used, being incremented as
> long as the two vectors are the same, probably making it overflow on large
> vectors with equal data. Changing it to int should solve the problem.
> Here follows the hadoop logs of when the regionserver went down. Any help is
> appreciated. Any other information you need please do tell me:
> 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller]
> wal.FSHLog: Rolled WAL
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
> with entries=143, filesize=134.70 MB; new WAL
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
> 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller]
> wal.FSHLog: Archiving
> hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> to
> hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion:
> Started memstore flush for
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region
> memstore size 128.04 MB
> 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer:
> ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required.
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
> at
> org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
> at
> org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
> ... 7 more
> 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer:
> RegionServer abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)