[
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150992#comment-13150992
]
stack commented on HBASE-4797:
------------------------------
Oh... i suppose its a bit worse than I though. I'm looking at a region that
has nearly 6k recovered.edits files to replay. The RegionServer is doing this
per file:
{code}
2011-11-16 03:06:02,403 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Applied 0, skipped 33, firstSequenceidInLog=296860, maxSequenceidInLog=351600,
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296860
2011-11-16 03:06:02,405 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Replaying edits from
hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296914;
minSequenceid=351600;
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296914
2011-11-16 03:06:05,097 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Attempting to transition node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-11-16 03:06:05,278 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Successfully transitioned node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-11-16 03:06:05,278 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Applied 0, skipped 33, firstSequenceidInLog=296914, maxSequenceidInLog=351600,
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296914
2011-11-16 03:06:05,279 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Replaying edits from
hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296970;
minSequenceid=351600;
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296970
2011-11-16 03:06:05,952 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Attempting to transition node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-11-16 03:06:06,093 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Successfully transitioned node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-11-16 03:06:06,093 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Applied 0, skipped 44, firstSequenceidInLog=296970, maxSequenceidInLog=351600,
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000296970
2011-11-16 03:06:06,094 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Replaying edits from
hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000297041;
minSequenceid=351600;
path=hdfs://sv4r11s38:7000/hbase/TestTable/69ab6eb0e2feff1fda52d36d8fa75798/recovered.edits/0000000000000297041
2011-11-16 03:06:06,795 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Attempting to transition node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-11-16 03:06:06,810 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x133a5bab186271f Successfully transitioned node
69ab6eb0e2feff1fda52d36d8fa75798 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
{code}
> [availability] Give recovered.edits files better names, ones that include
> first and last sequence id so we can skip files with edits we know older than
> current region has
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-4797
> URL: https://issues.apache.org/jira/browse/HBASE-4797
> Project: HBase
> Issue Type: Bug
> Components: performance
> Reporter: stack
>
> Testing 0.92, I crashed all servers out. Another bug makes it so WALs are
> not getting cleaned so I had 7000 regions to replay. The distributed split
> code did a nice job and cluster came back but interesting is that some hot
> regions ended up having loads of recovered.edits files -- tens if not
> hundreds -- to replay against the region (can we bulk load recovered.edits
> instead of replaying them?). Each recovered.edits file is taking about a
> second to process (though only about 30 odd edits per file it seems). The
> region is unavailable during this time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira