Hi, all

I noticed there's wrong timestamp from the status report of " ./hadoop
dfsadmin -upgradeProgress details", although the time setting on the server
is right, will this matter?

<-------------------------------------------------------------------------------------------------------------------------
Distributed upgrade for version -6 is in progress. Status = 0%

        Last Block Level Stats updated at : Thu Jan 01 08:00:00 GMT+08:00
1970
        Last Block Level Stats : Total Blocks : 0
                                 Fully Upgragraded : 0.00%
                                 Minimally Upgraded : 0.00%
                                 Under Upgraded : 0.00% (includes
Un-upgraded blocks)
                                 Un-upgraded : 0.00%
                                 Errors : 0
        Brief Datanode Status  : Avg completion of all Datanodes: 0.00% with
0 errors.

        Datanode Stats (total: 0): pct Completion(%) blocks upgraded (u)
blocks remaining (r) errors (e)

                There are no known Datanodes
------------------------------------------------------------------------------------------------------------------------->

Here is the tcpdump I made using "tcpdump host 192.168.2.101 and 192.168.2.1"
on one of the data-nodes from the start of the cluster to the loss of the
connection, where 192.168.2.101 is the datanode, and 192.168.2.1 is the
name-node.

<-------------------------------------------------------------------------------------------------------------------------
03:21:01.082055 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: S
3124566345:3124566345(0) win 5840 <mss 1460,sackOK,timestamp 12085778
0,nop,wscale 7>
03:21:01.084143 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: S
635998938:635998938(0) ack 3124566346 win 5792 <mss 1460,sackOK,timestamp
211599828 12085778,nop,wscale 7>
03:21:01.082120 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 1 win
46 <nop,nop,timestamp 12085778 211599828>
03:21:01.090313 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P 1:22(21)
ack 1 win 46 <nop,nop,timestamp 211599830 12085778>
03:21:01.095758 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 22
win 46 <nop,nop,timestamp 12085781 211599830>
03:21:01.095876 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P 1:21(20)
ack 22 win 46 <nop,nop,timestamp 12085781 211599830>
03:21:01.095903 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 21
win 46 <nop,nop,timestamp 211599832 12085781>
03:21:01.096282 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
21:773(752) ack 22 win 46 <nop,nop,timestamp 12085782 211599832>
03:21:01.096304 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 773
win 57 <nop,nop,timestamp 211599832 12085782>
03:21:01.097154 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
22:766(744) ack 773 win 57 <nop,nop,timestamp 211599832 12085782>
03:21:01.097795 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
773:797(24) ack 766 win 58 <nop,nop,timestamp 12085782 211599832>
03:21:01.100199 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
766:918(152) ack 797 win 57 <nop,nop,timestamp 211599833 12085782>
03:21:01.106536 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
797:941(144) ack 918 win 69 <nop,nop,timestamp 12085784 211599833>
03:21:01.108781 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
918:1382(464) ack 941 win 69 <nop,nop,timestamp 211599835 12085784>
03:21:01.113305 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
941:957(16) ack 1382 win 81 <nop,nop,timestamp 12085786 211599835>
03:21:01.155108 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 957
win 69 <nop,nop,timestamp 211599846 12085786>
03:21:01.155199 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
957:1005(48) ack 1382 win 81 <nop,nop,timestamp 12085796 211599846>
03:21:01.155217 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1005
win 69 <nop,nop,timestamp 211599846 12085796>
03:21:01.155273 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1382:1430(48) ack 1005 win 69 <nop,nop,timestamp 211599847 12085796>
03:21:01.155453 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1005:1069(64) ack 1430 win 81 <nop,nop,timestamp 12085796 211599847>
03:21:01.199106 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1069
win 69 <nop,nop,timestamp 211599857 12085796>
03:21:01.214178 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1430:1494(64) ack 1069 win 69 <nop,nop,timestamp 211599861 12085796>
03:21:01.214481 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1069:1597(528) ack 1494 win 81 <nop,nop,timestamp 12085811 211599861>
03:21:01.214518 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: . ack 1597
win 81 <nop,nop,timestamp 211599861 12085811>
03:21:01.218638 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1494:1974(480) ack 1597 win 81 <nop,nop,timestamp 211599862 12085811>
03:21:01.222363 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
1597:2173(576) ack 1974 win 93 <nop,nop,timestamp 12085813 211599862>
03:21:01.224255 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
1974:2006(32) ack 2173 win 93 <nop,nop,timestamp 211599864 12085813>
03:21:01.224521 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2173:2237(64) ack 2006 win 93 <nop,nop,timestamp 12085814 211599864>
03:21:01.227368 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2006:2054(48) ack 2237 win 93 <nop,nop,timestamp 211599865 12085814>
03:21:01.227689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2237:2493(256) ack 2054 win 93 <nop,nop,timestamp 12085814 211599865>
03:21:01.228913 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2054:2102(48) ack 2493 win 104 <nop,nop,timestamp 211599865 12085814>
03:21:01.268981 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2102
win 93 <nop,nop,timestamp 12085825 211599865>
03:21:01.344551 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2102:2246(144) ack 2493 win 104 <nop,nop,timestamp 211599894 12085825>
03:21:01.344689 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2246
win 104 <nop,nop,timestamp 12085844 211599894>
03:21:02.037296 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: S
638840154:638840154(0) win 5840 <mss 1460,sackOK,timestamp 211600067
0,nop,wscale 7>
03:21:02.037414 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: S
3130232567:3130232567(0) ack 638840155 win 5792 <mss 1460,sackOK,timestamp
12086017 211600067,nop,wscale 7>
03:21:02.037473 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 1 win 46 <nop,nop,timestamp 211600067 12086017>
03:21:02.049490 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
1:110(109) ack 1 win 46 <nop,nop,timestamp 211600070 12086017>
03:21:02.049626 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 110 win 46 <nop,nop,timestamp 12086020 211600070>
03:21:02.357928 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2246:2278(32) ack 2493 win 104 <nop,nop,timestamp 211600147 12085844>
03:21:02.358048 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2278
win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358089 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: P
2278:2374(96) ack 2493 win 104 <nop,nop,timestamp 211600147 12086097>
03:21:02.358178 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2374
win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358316 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: P
2493:2525(32) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.358356 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: F
2525:2525(0) ack 2374 win 104 <nop,nop,timestamp 12086097 211600147>
03:21:02.359169 IP TE-DN-001.local.TEST.ssh > 192.168.2.1.43129: F
2374:2374(0) ack 2526 win 104 <nop,nop,timestamp 211600147 12086097>
03:21:02.359254 IP 192.168.2.1.43129 > TE-DN-001.local.TEST.ssh: . ack 2375
win 104 <nop,nop,timestamp 12086097 211600147>
03:22:03.064540 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
110:214(104) ack 1 win 46 <nop,nop,timestamp 211615323 12086020>
03:22:03.064664 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 214 win 46 <nop,nop,timestamp 12101272 211615323>
03:22:08.065775 arp who-has TE-DN-001.local.TEST tell 192.168.2.1
03:22:08.065791 arp reply TE-DN-001.local.TEST is-at 00:18:37:02:74:76 (oui
Unknown)
03:22:54.349567 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
1:20(19) ack 214 win 46 <nop,nop,timestamp 12114091 211615323>
03:22:54.349624 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 20 win 46 <nop,nop,timestamp 211628143 12114091>
03:22:54.349708 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
20:39(19) ack 214 win 46 <nop,nop,timestamp 12114091 211628143>
03:22:54.349718 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 39 win 46 <nop,nop,timestamp 211628143 12114091>
03:22:54.385237 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: P
214:242(28) ack 39 win 46 <nop,nop,timestamp 211628152 12114091>
03:22:54.385342 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: .
ack 242 win 46 <nop,nop,timestamp 12114100 211628152>
03:22:54.391417 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: P
39:146(107) ack 242 win 46 <nop,nop,timestamp 12114101 211628152>
03:22:54.433048 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 146 win 46 <nop,nop,timestamp 211628164 12114101>
03:22:55.525390 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: F
242:242(0) ack 146 win 46 <nop,nop,timestamp 211628437 12114101>
03:22:55.525719 IP 192.168.2.1.cslistener > TE-DN-001.local.TEST.28260: F
146:146(0) ack 243 win 46 <nop,nop,timestamp 12114385 211628437>
03:22:55.525746 IP TE-DN-001.local.TEST.28260 > 192.168.2.1.cslistener: .
ack 147 win 46 <nop,nop,timestamp 211628437 12114385>
------------------------------------------------------------------------------------------------------------------------->



On 9/13/07, Raghu Angadi <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Datanode should be able to connect to Namenode for any progress on
> upgrade. Do you see any other errors reported in datanode log? You need
> to fix the connection problem first.
>
> Are you comfortable taking tcpdump for Namenode port on the client? I
> think client should be trying to reconnect.
>
> Note that it is safe to restart the cluster or just the datanodes before
> the upgrade completes.
>
> Raghu.
> Open Study wrote:
> > Also I checked the log of the name node, and found one exception as
> followed
> >
> > 2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 6 on 9000: starting
> > 2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 7 on 9000: starting
> > 2007-09-13 02:17:25,324 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 8 on 9000: starting
> > 2007-09-13 02:17:25,325 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 9 on 9000: starting
> > 2007-09-13 02:17:25,400 INFO
> org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
> > Block CRC Upgrade is still running.
> >                                  Avg completion of all Datanodes: 0.00%with
> > 0 errors.
> > 2007-09-13 02:17:25,406 WARN org.apache.hadoop.ipc.Server: IPC Server
> > handler 5 on 9000, call getProtocolVersion(org.apache.hado
> > op.dfs.ClientProtocol, 14) from 192.168.2.1:53211: output error
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(
> > SocketChannelImpl.java:125)
> >         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java
> :294)
> >         at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(
> > SocketChannelOutputStream.java:108)
> >         at org.apache.hadoop.ipc.SocketChannelOutputStream.write(
> > SocketChannelOutputStream.java:89)
> >         at java.io.BufferedOutputStream.flushBuffer(
> > BufferedOutputStream.java:65)
> >         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java
> :123)
> >         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:585)
> > 2007-09-13 02:18:24,921 INFO
> org.apache.hadoop.dfs.BlockCrcUpgradeNamenode:
> > Block CRC Upgrade is still running.
> >         Avg completion of all Datanodes: 0.00% with 0 errors.
> >
> > It seems some thing was going wong on data node side, however the log of
> one
> > of the data nodes show  it was started, and it was still running as I
> can
> > find from the processes list, but some how lost connection with the
> > name-node.
> >
> > ************************************************************/
> > 2007-09-12 22:23:35,319 INFO org.apache.hadoop.dfs.DataNode:
> STARTUP_MSG:
> > /************************************************************
> > STARTUP_MSG: Starting DataNode
> > STARTUP_MSG:   host = TE-DN-002/192.168.2.102
> > STARTUP_MSG:   args = []
> > ************************************************************/
> > 2007-09-12 22:23:35,533 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > Initializing JVM Metrics with processName=DataNode, sessi
> > onId=null
> > 2007-09-12 22:24:35,619 INFO org.apache.hadoop.ipc.RPC: Problem
> connecting
> > to server: /192.168.2.1:9000
> > 2007-09-12 22:25:34,878 INFO org.apache.hadoop.dfs.Storage: Recovering
> > storage directory /home/textd/data/fs/data from previous
> > upgrade.
> > 2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.DataNode:
> >    Distributed upgrade for DataNode version -6 to current LV -7 is
> > initialized.
> > 2007-09-12 22:25:49,586 INFO org.apache.hadoop.dfs.Storage: Upgrading
> > storage directory /home/textd/data/fs/data.
> >    old LV = -4; old CTime = 0.
> >    new LV = -7; new CTime = 1189616555276
> >
> > The hardware configuration was
> > Namenode: P4D, 3G RAM
> > 3 Datanodes: AMD 64 4000x2, 1G RAM
> > They worked with hadoop 0.13.1
> >
> > Any idea or suggestion?
> >
>
>

Reply via email to