[ 
https://issues.apache.org/jira/browse/KUDU-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438546#comment-15438546
 ] 

Dinesh Bhat commented on KUDU-1534:
-----------------------------------

[~adar] I completed all the tests in different permutations/combinations like 
following:

 - Performed above 4 steps with master/tservers running 0.9.1, and moving 
master to trunk(top-of-the-tree with changes). Result: logs indicated TSes 
registered to master again and HBs returned to normal. kudu-ksck reported OK.
 - Performed above 4 steps with master/tservers running 0.9.1, and moving 2 of 
the tservers to trunk. Result: logs indicated both the tservers registered with 
master again and HBs returned normal. kudu-ksck reported OK.
- Performed the above tests exactly in reverse order too - i.e Run 
master/tservers with trunk first and then moving them selectively to 0.9.1 
while keeping others at  trunk. kudu-ksck yielded same results and HBs returned 
to normal as well.

Few logs for first test are as follows:
{noformat}
I0825 23:03:11.381207 24316 master_main.cc:58] Initializing master server...
I0825 23:03:11.381466 24316 hybrid_clock.cc:177] HybridClock initialized. 
Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 
242293
I0825 23:03:11.384858 24316 fs_manager.cc:243] Opened local filesystem: 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0
uuid: "3b313b752ce844209ce84009c91c94df"
format_stamp: "Formatted at 2016-08-25 21:52:23 on ve0518.halxg.cloudera.com"
I0825 23:03:11.386963 24316 master_main.cc:61] Starting Master server...
I0825 23:03:11.392076 24316 rpc_server.cc:164] RPC server started. Bound to: 
127.0.0.1:57007
I0825 23:03:11.392274 24316 webserver.cc:122] Starting webserver on 
localhost:39085
I0825 23:03:11.392288 24316 webserver.cc:133] Document root disabled
I0825 23:03:11.392870 24316 net_util.cc:195] Address 127.0.0.1:39085 for 
localhost:39085 duplicates an earlier resolved entry.
I0825 23:03:11.393106 24316 webserver.cc:217] Webserver started. Bound to: 
http://127.0.0.1:39085/
I0825 23:03:11.393304 24316 server_base.cc:238] Dumped server information to 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/info.pb
I0825 23:03:11.394233 24370 tablet_bootstrap.cc:400] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Bootstrap 
starting.
I0825 23:03:11.395364 24370 tablet_bootstrap.cc:563] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Time spent 
opening tablet: real 0.001s      user 0.001s     sys 0.000s
I0825 23:03:11.395448 24370 tablet_bootstrap.cc:619] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Will 
attempt to recover log segment 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000/wal-000000001
 to 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001
I0825 23:03:11.395460 24370 tablet_bootstrap.cc:627] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Moving log 
directory 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000
 to recovery directory 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery
 in preparation for log replay
W0825 23:03:11.395608 24370 log_util.cc:311] Could not read footer for segment: 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001:
 Not found: Footer not found. Footer magic doesn't match
I0825 23:03:11.395620 24370 log_reader.cc:152] Log segment 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001
 was likely left in-progress after a previous crash. Will try to rebuild footer 
by scanning data.
I0825 23:03:11.399994 24370 log_util.cc:569] Scanning 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001
 for valid entry headers following offset 186922...
I0825 23:03:11.456295 24370 log_util.cc:606] Found no log entry headers
I0825 23:03:11.456331 24370 log_util.cc:215] Ignoring log segment corruption in 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001
 because there are no log entries following the corrupted one. The server 
probably crashed in the middle of writing an entry to the write-ahead log or 
downloaded an active log via tablet copy. Error detail: Corruption: CRC 
mismatch in log entry header: Log file corruption detected. Failed trying to 
read batch #0 at offset 186910 for log segment 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001:
 Prior entries: [off=186293 REPLICATE (0.327)] [off=186330 COMMIT (0.327)] 
[off=186873 REPLICATE (0.328)] [off=186910 COMMIT (0.328)]
I0825 23:03:11.456344 24370 log_util.cc:359] Successfully rebuilt footer for 
segment: 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery/wal-000000001
 (valid entries through byte offset 186910)
I0825 23:03:11.456482 24370 tablet.cc:853] T 00000000000000000000000000000000 
Rewinding schema during bootstrap to Schema [
        10:entry_type[int8 NOT NULL],
        11:entry_id[string NOT NULL],
        12:metadata[string NOT NULL]
]
I0825 23:03:11.456837 24370 log.cc:349] Log is configured to *not* fsync() on 
all Append() calls
I0825 23:03:11.484361 24370 tablet_bootstrap.cc:400] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Bootstrap 
replayed 1/1 log segments. Stats: ops{read=328 overwritten=0 applied=328 
ignored=323} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0} 
orphaned_commits=0. Pending: 0 replicates
I0825 23:03:11.484382 24370 tablet_bootstrap.cc:1040] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: 
ReplayState: Previous OpId: 0.328, Committed OpId: 0.328, Pending Replicates: 
0, Pending Commits: 0
I0825 23:03:11.484642 24370 tablet_bootstrap.cc:656] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Preparing 
to delete log recovery files and directory 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery
I0825 23:03:11.484652 24370 tablet_bootstrap.cc:659] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Renaming 
log recovery dir from 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery
 to 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery-1472191391484650
I0825 23:03:11.484675 24370 tablet_bootstrap.cc:669] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Deleting 
all files from renamed log recovery directory 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery-1472191391484650
I0825 23:03:11.484854 24370 tablet_bootstrap.cc:672] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Completed 
deletion of old log recovery files and directory 
/tmp/kudutest-4519/create-table-itest.CreateTableITest.TestSpreadReplicasEvenly.1472161943177725-27674/minicluster-data/master-0/wals/00000000000000000000000000000000.recovery-1472191391484650
I0825 23:03:11.484866 24370 tablet_bootstrap.cc:400] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df: Bootstrap 
complete.
I0825 23:03:11.494962 24370 raft_consensus.cc:275] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 5 
FOLLOWER]: Replica starting. Triggering 0 pending transactions. Active config: 
opid_index: -1 OBSOLETE_local: true peers { permanent_uuid: 
"3b313b752ce844209ce84009c91c94df" member_type: VOTER }
I0825 23:03:11.495009 24370 raft_consensus.cc:503] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 5 
FOLLOWER]: Becoming Follower/Learner. State: Replica: 
3b313b752ce844209ce84009c91c94df, State: 1, Role: FOLLOWER
Watermarks: {Received: term: 0 index: 328 Committed: term: 0 index: 328}
I0825 23:03:11.495031 24370 consensus_queue.cc:162] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, 
Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.328, 
Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER
I0825 23:03:11.495041 24370 raft_consensus.cc:320] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 5 
FOLLOWER]: Only one voter in the Raft config. Triggering election immediately
I0825 23:03:11.495046 24370 raft_consensus.cc:376] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 5 
FOLLOWER]: No leader contacted us within the election timeout. Triggering 
leader election
I0825 23:03:11.495054 24370 raft_consensus.cc:2022] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 5 
FOLLOWER]: Advancing to term 6
I0825 23:03:11.495218 24370 raft_consensus.cc:1973] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
FOLLOWER]: Snoozing failure detection for election timeout plus an additional 
2.026s
I0825 23:03:11.495235 24370 raft_consensus.cc:392] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
FOLLOWER]: Starting election with config: opid_index: -1 OBSOLETE_local: true 
peers { permanent_uuid: "3b313b752ce844209ce84009c91c94df" member_type: VOTER }
I0825 23:03:11.495496 24370 leader_election.cc:248] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[CANDIDATE]: Term 6 election: Election decided. Result: candidate won.
I0825 23:03:11.495761 24374 raft_consensus.cc:1973] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
FOLLOWER]: Snoozing failure detection for election timeout plus an additional 
2.056s
I0825 23:03:11.495826 24374 raft_consensus.cc:1863] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
FOLLOWER]: Leader election won for term 6
I0825 23:03:11.496075 24374 raft_consensus.cc:468] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
LEADER]: Becoming Leader. State: Replica: 3b313b752ce844209ce84009c91c94df, 
State: 1, Role: LEADER
Watermarks: {Received: term: 0 index: 328 Committed: term: 0 index: 328}
I0825 23:03:11.496141 24374 consensus_queue.cc:145] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [LEADER]: 
Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated 
op: 0.328, Committed index: 0.328, Last appended: 0.328, Current term: 6, 
Majority size: 1, State: 1, Mode: LEADER, active raft config: opid_index: -1 
OBSOLETE_local: true peers { permanent_uuid: "3b313b752ce844209ce84009c91c94df" 
member_type: VOTER }
I0825 23:03:11.496359 24370 sys_catalog.cc:255] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: SysCatalogTable state changed. Reason: Started TabletPeer. 
Latest consensus state: current_term: 6 leader_uuid: 
"3b313b752ce844209ce84009c91c94df" config { opid_index: -1 OBSOLETE_local: true 
peers { permanent_uuid: "3b313b752ce844209ce84009c91c94df" member_type: VOTER } 
}
I0825 23:03:11.496376 24370 sys_catalog.cc:258] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: This master's current role is: LEADER
I0825 23:03:11.496384 24376 sys_catalog.cc:255] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: SysCatalogTable state changed. Reason: New leader 
3b313b752ce844209ce84009c91c94df. Latest consensus state: current_term: 6 
leader_uuid: "3b313b752ce844209ce84009c91c94df" config { opid_index: -1 
OBSOLETE_local: true peers { permanent_uuid: "3b313b752ce844209ce84009c91c94df" 
member_type: VOTER } }
I0825 23:03:11.496382 24375 sys_catalog.cc:255] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: SysCatalogTable state changed. Reason: RaftConsensus started. 
Latest consensus state: current_term: 6 leader_uuid: 
"3b313b752ce844209ce84009c91c94df" config { opid_index: -1 OBSOLETE_local: true 
peers { permanent_uuid: "3b313b752ce844209ce84009c91c94df" member_type: VOTER } 
}
I0825 23:03:11.496402 24376 sys_catalog.cc:258] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: This master's current role is: LEADER
I0825 23:03:11.496420 24375 sys_catalog.cc:258] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: This master's current role is: LEADER
I0825 23:03:11.496626 24377 raft_consensus_state.cc:536] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df [term 6 
LEADER]: Advanced the committed_index across terms. Last committed operation 
was: term: 0 index: 328 New committed index is: term: 6 index: 329
I0825 23:03:11.496662 24370 sys_catalog.cc:333] T 
00000000000000000000000000000000 P 3b313b752ce844209ce84009c91c94df 
[sys.catalog]: configured and running, proceeding with master startup.
W0825 23:03:11.496773 24379 catalog_manager.cc:365] Catalog manager background 
task thread going to sleep: Service unavailable: Catalog manager is not 
initialized. State: 1
I0825 23:03:11.496814 24378 catalog_manager.cc:660] Loading table and tablet 
metadata into memory...
I0825 23:03:11.496812 24316 master_main.cc:64] Master server successfully 
started.
I0825 23:03:11.498020 24378 catalog_manager.cc:227] Loaded metadata for table 
test-table [id=162312d1479340109aec6ff77ce314f1]
I0825 23:03:11.498464 24378 catalog_manager.cc:277] Loaded metadata for tablet 
17aec689be0047b987494787029f6e08 (table test-table 
[id=162312d1479340109aec6ff77ce314f1])
I0825 23:03:11.498555 24378 catalog_manager.cc:277] Loaded metadata for tablet 
1f7a6bd64c604f7fada909bd575708cd (table test-table 
[id=162312d1479340109aec6ff77ce314f1])
I0825 23:03:11.498637 24378 catalog_manager.cc:277] Loaded metadata for tablet 
f32f6e0b1f354643b6b030599a359fa6 (table test-table 
[id=162312d1479340109aec6ff77ce314f1])
I0825 23:03:11.637743 24318 delta_compaction.cc:259] Starting major delta 
compaction for columns metadata[string NOT NULL] 
I0825 23:03:11.637809 24318 delta_compaction.cc:263] Preparing to major compact 
delta file: 2186216669234377291
I0825 23:03:11.638020 24318 multi_column_writer.cc:85] Opened CFile writer for 
column metadata[string NOT NULL]
I0825 23:03:11.640836 24318 delta_compaction.cc:269] Finished major delta 
compaction of columns metadata[string NOT NULL] 
I0825 23:03:11.643154 24318 maintenance_manager.cc:353] Time spent running 
MajorDeltaCompactionOp(00000000000000000000000000000000): real 0.006s        
user 0.002s     sys 0.000s
I0825 23:03:11.775498 24098 heartbeater.cc:294] Connected to a master server at 
127.0.0.1:57007
I0825 23:03:11.775861 24337 master_service.cc:108] Got heartbeat from unknown 
tserver (permanent_uuid: "92f34d06b19146db82cf42701d2363c5" instance_seqno: 
1472190855818066) as {real_user=dinesh, eff_user=} at 127.108.26.1:51136; 
Asking this server to re-register.
I0825 23:03:11.776144 24098 heartbeater.cc:361] Registering TS with master...
I0825 23:03:11.776177 24098 heartbeater.cc:375] Master 127.0.0.1:57007 
requested a full tablet report, sending...
I0825 23:03:11.776532 24337 ts_manager.cc:83] Registered new tserver 
permanent_uuid: "92f34d06b19146db82cf42701d2363c5" instance_seqno: 
1472190855818066 with Master
I0825 23:03:11.785338 24337 master_service.cc:108] Got heartbeat from unknown 
tserver (permanent_uuid: "401cb2671e94452f87838515c7408bdc" instance_seqno: 
1472190099967651) as {real_user=dinesh, eff_user=} at 127.108.26.2:36338; 
Asking this server to re-register.
I0825 23:03:11.786317 24337 ts_manager.cc:83] Registered new tserver 
permanent_uuid: "401cb2671e94452f87838515c7408bdc" instance_seqno: 
1472190099967651 with Master
I0825 23:03:11.891396 23935 heartbeater.cc:294] Connected to a master server at 
127.0.0.1:57007
I0825 23:03:11.891669 24337 master_service.cc:108] Got heartbeat from unknown 
tserver (permanent_uuid: "d9caa1baaef84124a4a530b35945fad3" instance_seqno: 
1472190762838162) as {real_user=dinesh, eff_user=} at 127.108.26.0:46047; 
Asking this server to re-register.
I0825 23:03:11.891826 23935 heartbeater.cc:361] Registering TS with master...
I0825 23:03:11.891863 23935 heartbeater.cc:375] Master 127.0.0.1:57007 
requested a full tablet report, sending...
I0825 23:03:11.892176 24337 ts_manager.cc:83] Registered new tserver 
permanent_uuid: "d9caa1baaef84124a4a530b35945fad3" instance_seqno: 
1472190762838162 with Master

[dinesh@ve0518 debug]$ 

{noformat}

> expose software version in ListMaster RPC response
> --------------------------------------------------
>
>                 Key: KUDU-1534
>                 URL: https://issues.apache.org/jira/browse/KUDU-1534
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Dan Burkert
>            Assignee: Dinesh Bhat
>            Priority: Minor
>              Labels: newbie
>         Attachments: cluster-downgrade.log, cluster-upgrade.log
>
>
> KUDU-1490 exposed the software version of tablet servers in the 
> GetTabletServers RPC response, but an equivalent doesn't exist for 
> ListMasters response.  This will become more important as multi-master setups 
> get more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to