[
https://issues.apache.org/jira/browse/HBASE-26209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438451#comment-17438451
]
shihuafeng edited comment on HBASE-26209 at 11/4/21, 2:19 AM:
--------------------------------------------------------------
*2. why did region assignment happen in split log*
*i* think that it trigger history task on
hdfs://nameservice1/hbase/WALs/*-splitting
021-11-02 11:39:14,081 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1635822375153
*2021-11-02 12:00:45,919 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,{color:#ff0000}1627032643439{color}*
2021-11-02 12:00:46,049 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1631773418557
2021-11-02 12:00:46,178 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1628213135404
2021-11-02 12:00:46,306 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy11.esgync.local,60020,1631328697683
2021-11-02 12:00:46,435 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy21.esgync.local,60020,1623984229877
2021-11-02 12:00:46,435 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1627032643439
2021-11-02 12:00:46,570 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy21.esgync.local,60020,1627351005533
2021-11-02 12:00:46,572 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1631773418557
2021-11-02 12:00:46,701 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1632296713023
2021-11-02 12:00:46,702 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1628213135404
2021-11-02 12:00:46,847 INFO
*region assign log*
{color:#ff0000}2021-11-02 12:00:45,550{color} INFO
org.apache.hadoop.hbase.master.RegionStates: Transition
\{617e7194714d0952104a3626935495db state=OPEN, ts=1635825645550,
server=gy14.esgync.local,60020,1635824505300} to
\{617e7194714d0952104a3626935495db state=OFFLINE, ts=1635825645550,
server=gy14.esgync.local,60020,1635824505300}
2021-11-02 12:01:05,917 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=OFFLINE, ts=1635825665835,
server=gy14.esgync.local,60020,1635824505300} to
\{617e7194714d0952104a3626935495db state=PENDING_OPEN, ts=1635825665917,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:07,110 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=PENDING_OPEN,
ts=1635825665917, server=gy14.esgync.local,60020,1635825620829} to
\{617e7194714d0952104a3626935495db state=OPENING, ts=1635825667110,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:32,226 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=OPENING, ts=1635825667110,
server=gy14.esgync.local,60020,1635825620829} to
\{617e7194714d0952104a3626935495db state=OPEN, ts=1635825692226,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:32,238 INFO org.apache.hadoop.hbase.master.RegionStates:
Offlined 617e7194714d0952104a3626935495db from
gy14.esgync.local,60020,1635824505300
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy25.esgync.local,60020,1626313309249
2021-11-02 12:00:46,847 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy11.esgync.local,60020,1631328697683
was (Author: shihuafeng):
*2. why did region assignment happen in split log*
*i* think that it trigger history task on /hbase/MasterProcWALs
021-11-02 11:39:14,081 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1635822375153
*2021-11-02 12:00:45,919 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,{color:#ff0000}1627032643439{color}*
2021-11-02 12:00:46,049 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1631773418557
2021-11-02 12:00:46,178 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1628213135404
2021-11-02 12:00:46,306 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy11.esgync.local,60020,1631328697683
2021-11-02 12:00:46,435 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy21.esgync.local,60020,1623984229877
2021-11-02 12:00:46,435 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1627032643439
2021-11-02 12:00:46,570 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy21.esgync.local,60020,1627351005533
2021-11-02 12:00:46,572 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1631773418557
2021-11-02 12:00:46,701 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy14.esgync.local,60020,1632296713023
2021-11-02 12:00:46,702 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy14.esgync.local,60020,1628213135404
2021-11-02 12:00:46,847 INFO
*region assign log*
{color:#FF0000}2021-11-02 12:00:45,550{color} INFO
org.apache.hadoop.hbase.master.RegionStates: Transition
\{617e7194714d0952104a3626935495db state=OPEN, ts=1635825645550,
server=gy14.esgync.local,60020,1635824505300} to
\{617e7194714d0952104a3626935495db state=OFFLINE, ts=1635825645550,
server=gy14.esgync.local,60020,1635824505300}
2021-11-02 12:01:05,917 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=OFFLINE, ts=1635825665835,
server=gy14.esgync.local,60020,1635824505300} to
\{617e7194714d0952104a3626935495db state=PENDING_OPEN, ts=1635825665917,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:07,110 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=PENDING_OPEN,
ts=1635825665917, server=gy14.esgync.local,60020,1635825620829} to
\{617e7194714d0952104a3626935495db state=OPENING, ts=1635825667110,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:32,226 INFO org.apache.hadoop.hbase.master.RegionStates:
Transition \{617e7194714d0952104a3626935495db state=OPENING, ts=1635825667110,
server=gy14.esgync.local,60020,1635825620829} to
\{617e7194714d0952104a3626935495db state=OPEN, ts=1635825692226,
server=gy14.esgync.local,60020,1635825620829}
2021-11-02 12:01:32,238 INFO org.apache.hadoop.hbase.master.RegionStates:
Offlined 617e7194714d0952104a3626935495db from
gy14.esgync.local,60020,1635824505300
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start processing
crashed gy25.esgync.local,60020,1626313309249
2021-11-02 12:00:46,847 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Finished
processing of crashed gy11.esgync.local,60020,1631328697683
> edit file loss result in data loss when restart hbase cluster
> ---------------------------------------------------------------
>
> Key: HBASE-26209
> URL: https://issues.apache.org/jira/browse/HBASE-26209
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Environment: Linux version 3.10.0-693.el7.x86_64
> ([email protected]) (gcc version 4.8.5 20150623 (Red
> Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017
> Reporter: shihuafeng
> Priority: Blocker
> Attachments: Repaly_edit.log, rename_edit.log
>
>
> {{ when i restart hbase cluster,i find edit file loss when wal
> repaly .}}
> the number of edit files (00000000seqid.tmp to 00000000seqid ) is
> 31 when split wal to edit. But when i read edits to repaly ,i foud the sum
> is 30.
> i see rename file is sucessful,but i can not find the edit file .
> {panel:title=/var/log/message i find system exception}
> ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66,
> found length 32 (20130517/exfield-389)
> Aug 16 03:30:41 esgsh6 kernel: ACPI Error: Method parse/execution failed
> [\_SB_.PMI0._PMM] (Node ffff8810e9eab258), AE_AML_BUFFER_LIMIT
> (20130517/psparse-536)
> {panel}
> # *rename (00000000seqid.tmp to 00000000seqid is 31*
>
> i can not find the follwing file , i confirm the edit file
> (0*000000000001825010*)is not empty.
>
> {panel:title=log}
> hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16
> 17:56:28,956 INFO org.apache.hadoop.hbase.wal.WALSplitter: Rename
> hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001810650.temp
> to
> hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0*000000000001825010*
> {panel}
> **
> you can see attachment{color:#999999} {color}
> *{color:#3366ff}r{color}*{color:#3366ff}*ename_edit.log*{color}
> *2. at replay phase , Reading the edits is 30*
>
> {panel:title=log}
> hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16
> 17:56:14,938 INFO org.apache.hadoop.hbase.regionserver.HRegion: after
> replayRecoveredEdits Maximum sequenceid 1914955 and minimum sequenceid for
> the region is 1916711, replay the file,
> path=hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001914955,seqid=1916711,*size=30*
> {panel}
> **
>
>
> {code:java}
> org.apache.hadoop.hbase.regionserver.HRegion
> NavigableSet<Path> files = WALSplitter.getSplitEditFilesSorted(fs,
> regiondir);
> if (LOG.isDebugEnabled()) {
> LOG.debug("Found " + (files == null ? 0 : files.size())
> + " recovered edits file(s) under " + regiondir);
> }
> if (files == null || files.isEmpty()) return seqid;
> long start=System.currentTimeMillis();
> for (Path edits: files) {
> if (edits == null || !fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
> }
> if (isZeroLengthThenDelete(fs, edits)) continue;
> long maxSeqId;
> String fileName = edits.getName();
> maxSeqId = Math.abs(Long.parseLong(fileName));
> if (maxSeqId <= minSeqIdForTheRegion) {
> if (LOG.isDebugEnabled()) {
> String msg = "Maximum sequenceid for this wal is " + maxSeqId
> + " and minimum sequenceid for the region is " + minSeqIdForTheRegion
> + ", skipped the whole file, path=" + edits;
> LOG.info(msg);
> }
> continue;
> }
> try {
> seqid = Math.max(seqid, replayRecoveredEdits(edits, maxSeqIdInStores,
> reporter));
> // replay the edits. Replay can return -1 if everything is skipped, only
> update
> // if seqId is greater
> String msg = "after replayRecoveredEdits Maximum sequenceid " + maxSeqId
> + " and minimum sequenceid for the region is " + minSeqIdForTheRegion
> + ", replay the file, path=" + edits
> +",seqid="+seqid+",size="+files.size();
> LOG.info(msg);{code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)