[
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380391#comment-14380391
]
Ruben Aguiar commented on HBASE-13329:
--------------------------------------
Another follow up. After a complete restart of the cluster, the region itself
failed while opening, the same java.lang.ArrayIndexOutOfBoundsException error
occurs. HMaster reports on the its interface that the region has failed to open:
tsdb,,1427300108453.317da7fabf9ea9b15de80377bb792cd8. state=FAILED_OPEN, ts=Wed
Mar 25 18:03:44 UTC 2015 (369s ago), server=10.2.0.73,16020,1427306588788
Additionally, if this region fails, and a restart is issued, 3 new files are
generated in /hbase/data/default/tsdb/317da7fabf9ea9b15de80377bb792cd8/.tmp .
Probably due to the opening failing and not cleaning these temporary files.
> Memstore flush fails if data has always the same value, breaking the region
> ---------------------------------------------------------------------------
>
> Key: HBASE-13329
> URL: https://issues.apache.org/jira/browse/HBASE-13329
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.0.1
> Environment: linux-debian-jessie
> ec2 - t2.micro instances
> Reporter: Ruben Aguiar
> Priority: Critical
> Fix For: 2.0.0, 1.1.0
>
>
> While trying to benchmark my opentsdb cluster, I've created a script that
> sends to hbase always the same value (in this case 1). After a few minutes,
> the whole region server crashes and the region itself becomes impossible to
> open again (cannot assign or unassign). After some investigation, what I saw
> on the logs is that when a Memstore flush is called on a large region (128mb)
> the process errors, killing the regionserver. On restart, replaying the edits
> generates the same error, making the region unavailable. Tried to manually
> unassign, assign or close_region. That didn't work because the code that
> reads/replays it crashes.
> From my investigation this seems to be an overflow issue. The logs show that
> the function getMinimumMidpointArray tried to access index -32743 of an
> array, extremely close to the minimum short value in Java. Upon investigation
> of the source code, it seems an index short is used, being incremented as
> long as the two vectors are the same, probably making it overflow on large
> vectors with equal data. Changing it to int should solve the problem.
> Here follows the hadoop logs of when the regionserver went down. Any help is
> appreciated. Any other information you need please do tell me:
> 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller]
> wal.FSHLog: Rolled WAL
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
> with entries=143, filesize=134.70 MB; new WAL
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
> 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller]
> wal.FSHLog: Archiving
> hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> to
> hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion:
> Started memstore flush for
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region
> memstore size 128.04 MB
> 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer:
> ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required.
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
> at
> org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
> at
> org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
> ... 7 more
> 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer:
> RegionServer abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)