Re: Not able to start Data Node

Arun C Murthy Wed, 02 Jan 2008 03:54:08 -0800

What version of Hadoop are you running?

http://wiki.apache.org/lucene-hadoop/Help


Dhaya007 wrote:
> ..datanode-slave.log

2007-12-19 19:30:55,579 WARN org.apache.hadoop.dfs.DataNode: Invalid
directory in dfs.data.dir: directory is not writable:
/tmp/hadoop-hdpusr/dfs/data
2007-12-19 19:30:55,579 ERROR org.apache.hadoop.dfs.DataNode: All
directories in dfs.data.dir are invalid.


Did you check that directory?

DataNode is complaining that it doesn't have any 'valid' directories tostore data in.

Tasktracker_slav.log
2008-01-02 15:10:34,419 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.net.UnknownHostException: unknown host:
localhost
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:136)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:532)
        at org.apache.hadoop.ipc.Client.call(Client.java:471)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:269)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:293)
        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:246)
        at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:427)
        at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:717)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)

That probably means that the TaskTracker's hadoop-site.xml says that'localhost' is the JobTracker which isn't true...


> namenode-master.log
> 2008-01-02 14:44:02,636 INFO org.apache.hadoop.dfs.Storage: Storage
> directory /tmp/hadoop-hdpusr/dfs/name does not exist.

> 2008-01-02 14:44:02,638 INFO org.apache.hadoop.ipc.Server: Stoppingserver

> on 54310
> 2008-01-02 14:44:02,653 ERROR org.apache.hadoop.dfs.NameNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory

> /tmp/hadoop-hdpusr/dfs/name is in an inconsistent state: storagedirectory

> does not exist or is not accessible.

That means that, /tmp/hadoop-hdpusr/dfs/name doesn't exist or isn'taccessible.


-*-*-

Overall, this looks like an acute case of wrong-configuration-itis.

Have you got the same hadoop-site.xml on all your nodes?

More info here:http://lucene.apache.org/hadoop/docs/r0.15.1/cluster_setup.html


Arun

2008-01-02 15:10:34,420 INFO org.apache.hadoop.mapred.TaskTracker:

SHUTDOWN_MSG:/************************************************************

SHUTDOWN_MSG: Shutting down TaskTracker at slave/172.16.0.58
************************************************************/

And all the ports are runningSome time it asks password and some time it wont while starting the dfs


Master logs

2008-01-02 14:44:02,677 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/172.16.0.25
************************************************************/

Datanode-master.log
2008-01-02 16:26:32,380 INFO org.apache.hadoop.ipc.RPC: Server at
localhost/127.0.0.1:54310 not available yet, Zzzzz...
2008-01-02 16:26:33,390 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
2008-01-02 16:26:34,400 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
2008-01-02 16:26:35,410 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
2008-01-02 16:26:36,420 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
***********************************************
Jobtracker_master.log
2008-01-02 16:25:41,040 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 10 time(s).
2008-01-02 16:25:42,050 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: /tmp/hadoop-hdpusr/mapred/system
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:520)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:152)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:542)
        at org.apache.hadoop.ipc.Client.call(Client.java:471)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:269)
        at org.apache.hadoop.dfs.DFSClient.createNamenode(DFSClient.java:147)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:161)
        at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:683)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:120)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2052)
2008-01-02 16:25:42,931 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54311, call getFilesystemName() from 127.0.0.1:49283: error:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
        at
org.apache.hadoop.mapred.JobTracker.getFilesystemName(JobTracker.java:1475)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-02 16:25:47,942 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 54311, call getFilesystemName() from 127.0.0.1:49293: error:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
        at
org.apache.hadoop.mapred.JobTracker.getFilesystemName(JobTracker.java:1475)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-02 16:25:52,061 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
2008-01-02 16:25:52,951 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54311, call getFilesystemName() from 127.0.0.1:49304: error:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
        at
org.apache.hadoop.mapred.JobTracker.getFilesystemName(JobTracker.java:1475)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2008-01-02 16:25:53,070 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
2008-01-02 16:25:54,080 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
2008-01-02 16:25:55,090 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
2008-01-02 16:25:56,100 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
2008-01-02 16:25:56,281 INFO org.apache.hadoop.mapred.JobTracker:

SHUTDOWN_MSG:/************************************************************

SHUTDOWN_MSG: Shutting down JobTracker at master/172.16.0.25
************************************************************/

Tasktracker_master.log
2008-01-02 16:26:14,080 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54311. Already tried 2 time(s).
2008-01-02 16:28:34,510 INFO org.apache.hadoop.mapred.TaskTracker:

STARTUP_MSG:/************************************************************

STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = master/172.16.0.25
STARTUP_MSG:   args = []
************************************************************/
2008-01-02 16:28:34,739 INFO org.mortbay.util.Credential: Checking Resource
aliases
2008-01-02 16:28:34,827 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4
2008-01-02 16:28:35,281 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-01-02 16:28:35,332 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2008-01-02 16:28:35,332 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2008-01-02 16:28:35,332 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2008-01-02 16:28:35,336 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50060
2008-01-02 16:28:35,336 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-01-02 16:28:35,383 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=TaskTracker, sessionId=
2008-01-02 16:28:35,402 INFO org.apache.hadoop.mapred.TaskTracker:
TaskTracker up at: /127.0.0.1:49599
2008-01-02 16:28:35,402 INFO org.apache.hadoop.mapred.TaskTracker: Starting
tracker tracker_master:/127.0.0.1:49599
2008-01-02 16:28:35,406 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 49599: starting
2008-01-02 16:28:35,406 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 49599: starting
2008-01-02 16:28:35,406 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 49599: starting
2008-01-02 16:28:35,490 INFO org.apache.hadoop.mapred.TaskTracker: Starting
thread: Map-events fetcher for all reduce tasks on
tracker_master:/127.0.0.1:49599
2008-01-02 16:28:35,500 INFO org.apache.hadoop.mapred.TaskTracker: Lost
connection to JobTracker [localhost/127.0.0.1:54311].  Retrying...
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: FileSystem object
not available yet
        at
org.apache.hadoop.mapred.JobTracker.getFilesystemName(JobTracker.java:1475)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at org.apache.hadoop.ipc.Client.call(Client.java:482)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.mapred.$Proxy0.getFilesystemName(Unknown Source)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:773)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1179)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)
*******************************************

Please help me to resolve the same.


Khalil Honsali wrote:

Hi,

I think you need to post more information, for example an excerpt of the
failing datanode log. Also, please clarify the issue of connectivity:
- are you able to ssh passwordless (from master to slave, slave to master,
slave to slave, master to master), you shouldn't be entering passwrd
everytime...
- are you able to telnet (not necessary but preferred)
- have you verified the ports as RUNNING on using netstat command?

besides, the tasktracker starts ok but not the datanode?

K. Honsali

On 02/01/2008, Dhaya007 <[EMAIL PROTECTED]> wrote:


I am new to hadoop if any think wrong please correct me ....
I Have configured a single/multi node cluster using following link

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
.
I have followed the link but i am not able to start the haoop in multi
node
environment
The problems i am facing are as Follows:
1.I have configured master and slave nodes with ssh less pharase if try
to
run the start-dfs.sh it prompt the password for master:slave machines.(I
have copied the .ssh/id_rsa.pub key of master in to slaves autherized_key
file)

2.After giving password datanode,namenode,jobtracker,tasktraker started
successfully in master but datanode is started in slave.


3.Some time step 2 works and some time it says that permission denied.

4.I have checked the log file in the slave for datanode it says that
incompatible node, then i have formated the slave, master and start the
dfs
by start-dfs.sh still i am getting the error


The host entry in etc/hosts are both master/slave
master
slave
conf/masters
master
conf/slaves
master
slave

The hadoop-site.xml  for both master/slave
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
 <name>hadoop.tmp.dir</name>
 <value>/home/hdusr/hadoop-${user.name}</value>
 <description>A base for other temporary directories.</description>
</property>

<property>
 <name>fs.default.name</name>
 <value>hdfs://master:54310</value>
 <description>The name of the default file system.  A URI whose
 scheme and authority determine the FileSystem implementation.  The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class.  The uri's authority is used to
 determine the host, port, etc. for a filesystem.</description>
</property>

<property>
 <name>mapred.job.tracker</name>
 <value>master:54311</value>
 <description>The host and port that the MapReduce job tracker runs
 at.  If "local", then jobs are run in-process as a single map
 and reduce task.
 </description>
</property>

<property>
 <name>dfs.replication</name>
 <value>2</value>
 <description>Default block replication.
 The actual number of replications can be specified when the file is
created.
 The default is used if replication is not specified in create time.
 </description>
</property>

<property>
 <name>mapred.map.tasks</name>
 <value>20</value>
 <description>As a rule of thumb, use 10x the number of slaves (i.e.,
number of tasktrackers).
 </description>
</property>

<property>
 <name>mapred.reduce.tasks</name>
 <value>4</value>
 <description>As a rule of thumb, use 2x the number of slave processors
(i.e., number of tasktrackers).
 </description>
</property>
</configuration>

Please help me to reslove the same. Or else provide any other tutorial
for
multi node cluster setup.I am egarly waiting for the tutorials.


Thanks

--
View this message in context:
http://www.nabble.com/Not-able-to-start-Data-Node-tp14573889p14573889.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Not able to start Data Node

Reply via email to