[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region

Hadoop QA (JIRA) Tue, 09 Jun 2015 12:23:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579434#comment-14579434
 ]


Hadoop QA commented on HBASE-13329:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738609/13329-v1.patch
  against master branch at commit 6cc42c8cd16d01cded9936bf53bf35e6e2ff5b66.
  ATTACHMENT ID: 12738609

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//console

This message is automatically generated.

> Memstore flush fails if data has always the same value, breaking the region
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-13329
>                 URL: https://issues.apache.org/jira/browse/HBASE-13329
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.0.1
>         Environment: linux-debian-jessie
> ec2 - t2.micro instances
>            Reporter: Ruben Aguiar
>            Assignee: Ruben Aguiar
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>
>         Attachments: 13329-v1.patch
>
>
> While trying to benchmark my opentsdb cluster, I've created a script that 
> sends to hbase always the same value (in this case 1). After a few minutes, 
> the whole region server crashes and the region itself becomes impossible to 
> open again (cannot assign or unassign). After some investigation, what I saw 
> on the logs is that when a Memstore flush is called on a large region (128mb) 
> the process errors, killing the regionserver. On restart, replaying the edits 
> generates the same error, making the region unavailable. Tried to manually 
> unassign, assign or close_region. That didn't work because the code that 
> reads/replays it crashes.
> From my investigation this seems to be an overflow issue. The logs show that 
> the function getMinimumMidpointArray tried to access index -32743 of an 
> array, extremely close to the minimum short value in Java. Upon investigation 
> of the source code, it seems an index short is used, being incremented as 
> long as the two vectors are the same, probably making it overflow on large 
> vectors with equal data. Changing it to int should solve the problem.
> Here follows the hadoop logs of when the regionserver went down. Any help is 
> appreciated. Any other information you need please do tell me:
> 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
> wal.FSHLog: Rolled WAL 
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
>  with entries=143, filesize=134.70 MB; new WAL 
> /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
> 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
> wal.FSHLog: Archiving 
> hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
>  to 
> hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
> 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Started memstore flush for 
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
> memstore size 128.04 MB
> 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
> ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
>       at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
>       at 
> org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
>       at 
> org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
>       at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
>       at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
>       ... 7 more
> 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
> RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region

Reply via email to