Re: FSDataInputStream.read(byte[]) only reads to a block boundary?

2009-06-28 Thread Raghu Angadi


This seems to be the case. I don't think there is any specific reason 
not to read across the block boundary...


Even if HDFS does read across the blocks, it is still not a good idea to 
ignore the JavaDoc for read(). If you want all the bytes read, then you 
should have a while loop or one of the readFully() variants. For e.g. if 
you later change your code by wrapping a BufferedInputStream around 
'in', you would still get partial reads even if HDFS reads all the data.


Raghu.

forbbs forbbs wrote:

The hadoop version is 0.19.0.
My file is larger than 64MB, and the block size is 64MB.

The output of the code below is '10'. May I read across the block
boundary?  Or I should use 'while (left..){}' style code?

 public static void main(String[] args) throws IOException
  {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FSDataInputStream fin = fs.open(new Path(args[0]));

fin.seek(64*1024*1024 - 10);
byte[] buffer = new byte[32*1024];
int len = fin.read(buffer);
//int len = fin.read(buffer, 0, 128);
System.out.println(len);

fin.close();
  }




Re: HDFS Random Access

2009-06-27 Thread Raghu Angadi


Yes, FSDataInputStream allows random access. There are way to read x 
bytes at a position p:

1) in.seek(p); read(buf, 0, x);
2) in.(p, buf, 0, x);
These two have slightly different semantics. The second one is preferred 
and is easier for HDFS to optimize further.


Random access should be pretty good with HDFS and it is increasingly 
getting more users and thus more importance. HBase is one of the users.


Just yesterday I attached a benchmark and comparissions to random access 
on native filesystem to https://issues.apache.org/jira/browse/HDFS-236 .


As of now, the overhead on average is about 2 ms over 9-10ms it takes 
for native read. There are a few fairly simple fixes possible to reduce 
this gap.


I think getFileStatus() is the way to find the length, though there 
might have been a call added to FSDataInputStream recently. I am not sure.


Raghu.
tsuraan wrote:

All the documentation for HDFS says that it's for large streaming
jobs, but I couldn't find an explicit answer to this, so I'll try
asking here.  How is HDFS's random seek performance within an
FSDataInputStream?  I use lucene with a lot of indices (potentially
thousands), so I was thinking of putting them into HDFS and
reimplementing my search as a Hadoop map-reduce.  I've noticed that
lucene tends to do a bit of random seeking when searching though; I
don't believe that it guarantees that all seeks be to increasing file
positions either.

Would HDFS be a bad fit for an access pattern that involves seeks to
random positions within a stream?

Also, is getFileStatus the typical way of getting the length of a file
in HDFS, or is there some method on FSDataInputStream that I'm not
seeing?

Please cc: me on any reply; I'm not on the hadoop list.  Thanks!




Re: UnknownHostException

2009-06-23 Thread Raghu Angadi


This is at RPC client level and there is requirement for fully qualified 
hostname. May be . at the end of 10.2.24.21 causing the problem?


btw, in 0.21 even fs.default.name does not need to be fully qualified 
name.. anything that resolves to an ipaddress is fine (at least for 
common/FS and HDFS).


Raghu.

Matt Massie wrote:
fs.default.name in your hadoop-site.xml needs to be set to a 
fully-qualified domain name (instead of an IP address)


-Matt

On Jun 23, 2009, at 6:42 AM, bharath vissapragada wrote:


when i try to execute the command bin/start-dfs.sh  , i get the following
error . I have checked the hadoop-site.xml file on all the nodes , and 
they

are fine ..
can some-one help me out!

10.2.24.21: Exception in thread main java.net.UnknownHostException:
unknown host: 10.2.24.21.
10.2.24.21: at
org.apache.hadoop.ipc.Client$Connection.init(Client.java:195)
10.2.24.21: at
org.apache.hadoop.ipc.Client.getConnection(Client.java:779)
10.2.24.21: at org.apache.hadoop.ipc.Client.call(Client.java:704)
10.2.24.21: at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
10.2.24.21: at 
org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown

Source)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
10.2.24.21: at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)






Re: UnknownHostException

2009-06-23 Thread Raghu Angadi

Raghu Angadi wrote:


This is at RPC client level and there is requirement for fully qualified 


I meant to say there is NO requirement ...


hostname. May be . at the end of 10.2.24.21 causing the problem?

btw, in 0.21 even fs.default.name does not need to be fully qualified


that fix is probably in 0.20 too.

Raghu.

name.. anything that resolves to an ipaddress is fine (at least for 
common/FS and HDFS).


Raghu.

Matt Massie wrote:
fs.default.name in your hadoop-site.xml needs to be set to a 
fully-qualified domain name (instead of an IP address)


-Matt

On Jun 23, 2009, at 6:42 AM, bharath vissapragada wrote:

when i try to execute the command bin/start-dfs.sh  , i get the 
following
error . I have checked the hadoop-site.xml file on all the nodes , 
and they

are fine ..
can some-one help me out!

10.2.24.21: Exception in thread main java.net.UnknownHostException:
unknown host: 10.2.24.21.
10.2.24.21: at
org.apache.hadoop.ipc.Client$Connection.init(Client.java:195)
10.2.24.21: at
org.apache.hadoop.ipc.Client.getConnection(Client.java:779)
10.2.24.21: at org.apache.hadoop.ipc.Client.call(Client.java:704)
10.2.24.21: at 
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
10.2.24.21: at 
org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown

Source)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
10.2.24.21: at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
10.2.24.21: at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)









Re: Too many open files error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi

Stas Oskin wrote:

Hi.

Any idea if calling System.gc() periodically will help reducing the amount
of pipes / epolls?


since you have HADOOP-4346, you should not have excessive epoll/pipe fds 
open. First of all do you still have the problem? If yes, how many 
hadoop streams do you have at a time?


System.gc() won't help if you have HADOOP-4346.

Ragu.


Thanks for your opinion!

2009/6/22 Stas Oskin stas.os...@gmail.com


Ok, seems this issue is already patched in the Hadoop distro I'm using
(Cloudera).

Any idea if I still should call GC manually/periodically to clean out all
the stale pipes / epolls?

2009/6/22 Steve Loughran ste...@apache.org


Stas Oskin wrote:

 Hi.

So what would be the recommended approach to pre-0.20.x series?

To insure each file is used only by one thread, and then it safe to close
the handle in that thread?

Regards.


good question -I'm not sure. For anythiong you get with FileSystem.get(),
its now dangerous to close, so try just setting the reference to null and
hoping that GC will do the finalize() when needed







Re: Too many open files error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi

To be more accurate, once you have HADOOP-4346,

fds for epoll and pipes = 3 * threads blocked on Hadoop I/O

Unless you have hundreds of threads at a time, you should not see 
hundreds of these. These fds stay up to 10sec even after the

threads exit.

I am a bit confused about your exact situation. Please check number of 
threads if you still facing the problem.


Raghu.

Raghu Angadi wrote:


since you have HADOOP-4346, you should not have excessive epoll/pipe fds 
open. First of all do you still have the problem? If yes, how many 
hadoop streams do you have at a time?


System.gc() won't help if you have HADOOP-4346.

Ragu.


Thanks for your opinion!

2009/6/22 Stas Oskin stas.os...@gmail.com


Ok, seems this issue is already patched in the Hadoop distro I'm using
(Cloudera).

Any idea if I still should call GC manually/periodically to clean out 
all

the stale pipes / epolls?

2009/6/22 Steve Loughran ste...@apache.org


Stas Oskin wrote:

 Hi.

So what would be the recommended approach to pre-0.20.x series?

To insure each file is used only by one thread, and then it safe to 
close

the handle in that thread?

Regards.

good question -I'm not sure. For anythiong you get with 
FileSystem.get(),
its now dangerous to close, so try just setting the reference to 
null and

hoping that GC will do the finalize() when needed









Re: Too many open files error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi


how many threads do you have? Number of active threads is very 
important. Normally,


#fds = (3 * #threads_blocked_on_io) + #streams

12 per stream is certainly way off.

Raghu.

Stas Oskin wrote:

Hi.

In my case it was actually ~ 12 fd's per stream, which included pipes and
epolls.

Could it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
thread, which make it close to the number I mentioned? Or it always 3 at
maximum per thread / stream?

Up to 10 sec looks quite the correct number, it seems it gets freed arround
this time indeed.

Regards.

2009/6/23 Raghu Angadi rang...@yahoo-inc.com


To be more accurate, once you have HADOOP-4346,

fds for epoll and pipes = 3 * threads blocked on Hadoop I/O

Unless you have hundreds of threads at a time, you should not see hundreds
of these. These fds stay up to 10sec even after the
threads exit.

I am a bit confused about your exact situation. Please check number of
threads if you still facing the problem.

Raghu.


Raghu Angadi wrote:


since you have HADOOP-4346, you should not have excessive epoll/pipe fds
open. First of all do you still have the problem? If yes, how many hadoop
streams do you have at a time?

System.gc() won't help if you have HADOOP-4346.

Ragu.

 Thanks for your opinion!

2009/6/22 Stas Oskin stas.os...@gmail.com

 Ok, seems this issue is already patched in the Hadoop distro I'm using

(Cloudera).

Any idea if I still should call GC manually/periodically to clean out
all
the stale pipes / epolls?

2009/6/22 Steve Loughran ste...@apache.org

 Stas Oskin wrote:

 Hi.


So what would be the recommended approach to pre-0.20.x series?

To insure each file is used only by one thread, and then it safe to
close
the handle in that thread?

Regards.

 good question -I'm not sure. For anythiong you get with

FileSystem.get(),
its now dangerous to close, so try just setting the reference to null
and
hoping that GC will do the finalize() when needed








Re: Too many open files error, which gets resolved after some time

2009-06-22 Thread Raghu Angadi


Is this before 0.20.0? Assuming you have closed these streams, it is 
mostly https://issues.apache.org/jira/browse/HADOOP-4346


It is the JDK internal implementation that depends on GC to free up its 
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.


Raghu.

Stas Oskin wrote:

Hi.

After tracing some more with the lsof utility, and I managed to stop the
growth on the DataNode process, but still have issues with my DFS client.

It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
a small part of the lsof output:

java10508 root  387w  FIFO0,6   6142565 pipe
java10508 root  388r  FIFO0,6   6142565 pipe
java10508 root  389u     0,100  6142566
eventpoll
java10508 root  390u  FIFO0,6   6135311 pipe
java10508 root  391r  FIFO0,6   6135311 pipe
java10508 root  392u     0,100  6135312
eventpoll
java10508 root  393r  FIFO0,6   6148234 pipe
java10508 root  394w  FIFO0,6   6142570 pipe
java10508 root  395r  FIFO0,6   6135857 pipe
java10508 root  396r  FIFO0,6   6142570 pipe
java10508 root  397r     0,100  6142571
eventpoll
java10508 root  398u  FIFO0,6   6135319 pipe
java10508 root  399w  FIFO0,6   6135319 pipe

I'm using FSDataInputStream and FSDataOutputStream, so this might be related
to pipes?

So, my questions are:

1) What happens these pipes/epolls to appear?

2) More important, how I can prevent their accumation and growth?

Thanks in advance!

2009/6/21 Stas Oskin stas.os...@gmail.com


Hi.

I have HDFS client and HDFS datanode running on same machine.

When I'm trying to access a dozen of files at once from the client, several
times in a row, I'm starting to receive the following errors on client, and
HDFS browse function.

HDFS Client: Could not get block locations. Aborting...
HDFS browse: Too many open files

I can increase the maximum number of files that can opened, as I have it
set to the default 1024, but would like to first solve the problem, as
larger value just means it would run out of files again later on.

So my questions are:

1) Does the HDFS datanode keeps any files opened, even after the HDFS
client have already closed them?

2) Is it possible to find out, who keeps the opened files - datanode or
client (so I could pin-point the source of the problem).

Thanks in advance!







Re: Too many open files error, which gets resolved after some time

2009-06-22 Thread Raghu Angadi


64k might help in the sense, you might hit GC before you hit the limit.

Otherwise, your only options are to use the patch attached to 
HADOOP-4346 or run System.gc() occasionally.


I think it should be committed to 0.18.4

Raghu.

Stas Oskin wrote:

Hi.

Yes, it happens with 0.18.3.

I'm closing now every FSData stream I receive from HDFS, so the number of
open fd's in DataNode is reduced.

Problem is that my own DFS client still have a high number of fd's open,
mostly pipes and epolls.
They sometimes quickly drop to the level of ~400 - 500, and sometimes just
stuck at ~1000.

I'm still trying to find out how well it behaves if I set the maximum fd
number to 65K.

Regards.



2009/6/22 Raghu Angadi rang...@yahoo-inc.com


Is this before 0.20.0? Assuming you have closed these streams, it is mostly
https://issues.apache.org/jira/browse/HADOOP-4346

It is the JDK internal implementation that depends on GC to free up its
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.

Raghu.


Stas Oskin wrote:


Hi.

After tracing some more with the lsof utility, and I managed to stop the
growth on the DataNode process, but still have issues with my DFS client.

It seems that my DFS client opens hundreds of pipes and eventpolls. Here
is
a small part of the lsof output:

java10508 root  387w  FIFO0,6   6142565 pipe
java10508 root  388r  FIFO0,6   6142565 pipe
java10508 root  389u     0,100  6142566
eventpoll
java10508 root  390u  FIFO0,6   6135311 pipe
java10508 root  391r  FIFO0,6   6135311 pipe
java10508 root  392u     0,100  6135312
eventpoll
java10508 root  393r  FIFO0,6   6148234 pipe
java10508 root  394w  FIFO0,6   6142570 pipe
java10508 root  395r  FIFO0,6   6135857 pipe
java10508 root  396r  FIFO0,6   6142570 pipe
java10508 root  397r     0,100  6142571
eventpoll
java10508 root  398u  FIFO0,6   6135319 pipe
java10508 root  399w  FIFO0,6   6135319 pipe

I'm using FSDataInputStream and FSDataOutputStream, so this might be
related
to pipes?

So, my questions are:

1) What happens these pipes/epolls to appear?

2) More important, how I can prevent their accumation and growth?

Thanks in advance!

2009/6/21 Stas Oskin stas.os...@gmail.com

 Hi.

I have HDFS client and HDFS datanode running on same machine.

When I'm trying to access a dozen of files at once from the client,
several
times in a row, I'm starting to receive the following errors on client,
and
HDFS browse function.

HDFS Client: Could not get block locations. Aborting...
HDFS browse: Too many open files

I can increase the maximum number of files that can opened, as I have it
set to the default 1024, but would like to first solve the problem, as
larger value just means it would run out of files again later on.

So my questions are:

1) Does the HDFS datanode keeps any files opened, even after the HDFS
client have already closed them?

2) Is it possible to find out, who keeps the opened files - datanode or
client (so I could pin-point the source of the problem).

Thanks in advance!








Re: Disk Usage Overhead of Hadoop Upgrade

2009-06-22 Thread Raghu Angadi


The initial overhead is fairly small (extra hard link for each file).

After that, the overhead grows as you delete the files (thus its blocks) 
that existed before the upgrade.. since the physical files for blocks 
are deleted only after you finalize.


So the overhead == (the blocks that got deleted after the upgrade).

Raghu.

Stu Hood wrote:

Hey gang,

We're preparing to upgrade our cluster from Hadoop 0.15.3 to 0.18.3.

How much disk usage overhead can we expect from the block conversion before we 
finalize the upgrade? In the worst case, will the upgrade cause our disk usage 
to double?

Thanks,

Stu Hood
Search Team Technical Lead
Email  Apps Division, Rackspace Hosting





Re: HDFS data transfer!

2009-06-11 Thread Raghu Angadi


Thanks Brian for the good advice.

Slightly off topic from original post: there will be occasions where it 
is necessary or better to copy different portions of a file in parallel 
(distcp can benefit a lot). There is a proposal to let HDFS 'stitch' 
multiple files into one: something like


NameNode.stitchFiles(Path to, Path[] files)

This way a very large file can be copied more efficiently (with a 
map/red job, for e.g). Another use case is for high latency and high 
bandwidth connections (like coast-to-coast). High latency can be some 
what worked around by using large buffers for tcp connections, but 
usually users don't have that control. It is just simpler to use 
multiple connections.


This will obviously be HDFS only interface (i.e. not a FileSystem 
method) at least initially.


Raghu.

Brian Bockelman wrote:

Hey Sugandha,

Transfer rates depend on the quality/quantity of your hardware and the 
quality of your client disk that is generating the data.  I usually say 
that you should expect near-hardware-bottleneck speeds for an otherwise 
idle cluster.


There should be no make it fast required (though you should reviewi 
the logs for errors if it's going slow).  I would expect a 5GB file to 
take around 3-5 minutes to write on our cluster, but it's a well-tuned 
and operational cluster.


As Todd (I think) mentioned before, we can't help any when you say I 
want to make it faster.  You need to provide diagnostic information - 
logs, Ganglia plots, stack traces, something - that folks can look at.


Brian

On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

But if I want to make it fast, then??? I want to place the data in 
HDFS and

reoplicate it in fraction of seconds. Can that be possible. and How?

On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena kartik@gmail.com 
wrote:



I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura

On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
sugandha@gmail.comwrote:It


Hello!

If I try to transfer a 5GB VDI file from a remote host(not a part of

hadoop

cluster) into HDFS, and get it back, how much time is it supposed to

take?


No map-reduce involved. Simply Writing files in and out from HDFS 
through

a

simple code of java (usage of API's).

--
Regards!
Sugandha







--
Regards!
Sugandha






Re: Multiple NIC Cards

2009-06-09 Thread Raghu Angadi


I still need to go through the whole thread. but we feel your pain.

First, please try setting fs.default.name to namenode internal ip on the 
datanodes. This should make NN to attach internal ip so the datanodes 
(assuming your routing is correct). NameNode webUI should list internal 
ips for datanode. You might have to temporarily change NameNode code to 
listen on 0.0.0.0.


That said, The issues you are facing are pretty unfortunate. As Steve 
mentioned Hadoop is all confused about hostname/ip and there is 
unecessary reliance on hostname and reverse DNS look ups in many many 
places.


At least fairly straight fwd set ups with multiple NICs should be 
handled well.


dfs.datanode.dns.interface should work like you expected (but not very 
surprised it didn't).


Another thing you could try is setting dfs.datanode.address to the 
internal ip address (this might already be discussed in the thread). 
This should at least get all the bulk datatransfers happen over internal 
NICs. One way to make sure is to hover on the datanode node on NameNode 
webUI.. it shows the ip address.


good luck.

It might be better document your pains and findings in a Jira (with most 
of the details in one or more comments rather than in description).


Raghu.

John Martyniak wrote:
So I changed all of the 0.0.0.0 on one machine to point to the 
192.168.1.102 address.


And still it picks up the hostname and ip address of the external network.

I am kind of at my wits end with this, as I am not seeing a solution 
yet, except to take the machines off of the external network and leave 
them on the internal network which isn't an option.


Has anybody had this problem before?  What was the solution?

-John

On Jun 9, 2009, at 10:17 AM, Steve Loughran wrote:

One thing to consider is that some of the various services of Hadoop 
are bound to 0:0:0:0, which means every Ipv4 address, you really want 
to bring up everything, including jetty services, on the en0 network 
adapter, by binding them to  192.168.1.102; this will cause anyone 
trying to talk to them over the other network to fail, which at least 
find the problem sooner rather than later






Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Raghu Angadi

Tom White wrote:

Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.


I don't think space is required. Something like 
-Dfs.default.name=host:port works. I don't see ToolRunner setting any 
java properties.



Ian, try putting the options after the classname and see if that
helps. Otherwise, it would be useful to see a snippet of the program
code.


right. the options that go to the class should appear after the class name.

Note that it is not necessary to use ToolRunner (which I don't find very 
 convenient in many cases). You can use GenericOptionsParser directly. 
an example : https://issues.apache.org/jira/browse/HADOOP-5961


Raghu.


Thanks,
Tom

On Thu, Jun 4, 2009 at 8:23 PM, Vasyl Keretsman vasi...@gmail.com wrote:

Perhaps, there should not be the space between -D and your option ?

-Dprise.collopts=

Vasyl



2009/6/4 Ian Soboroff ian.sobor...@nist.gov:

bin/hadoop jar -files collopts -D prise.collopts=collopts p3l-3.5.jar 
gov.nist.nlpir.prise.mapred.MapReduceIndexer input output

The 'prise.collopts' option doesn't appear in the JobConf.

Ian

Aaron Kimball aa...@cloudera.com writes:


Can you give an example of the exact arguments you're sending on the command
line?
- Aaron

On Wed, Jun 3, 2009 at 5:46 PM, Ian Soboroff ian.sobor...@nist.gov wrote:

If after I call getConf to get the conf object, I manually add the key/
value pair, it's there when I need it.  So it feels like ToolRunner isn't
parsing my args for some reason.

Ian

On Jun 3, 2009, at 8:45 PM, Ian Soboroff wrote:

Yes, and I get the JobConf via 'JobConf job = new JobConf(conf,
the.class)'.  The conf is the Configuration object that comes from
getConf.  Pretty much copied from the WordCount example (which this
program used to be a long while back...)

thanks,
Ian

On Jun 3, 2009, at 7:09 PM, Aaron Kimball wrote:

Are you running your program via ToolRunner.run()? How do you
instantiate the JobConf object?
- Aaron

On Wed, Jun 3, 2009 at 10:19 AM, Ian Soboroff 
ian.sobor...@nist.gov wrote:
I'm backporting some code I wrote for 0.19.1 to 0.18.3 (long
story), and I'm finding that when I run a job and try to pass
options with -D on the command line, that the option values aren't
showing up in my JobConf.  I logged all the key/value pairs in the
JobConf, and the option I passed through with -D isn't there.

This worked in 0.19.1... did something change with command-line
options from 18 to 19?

Thanks,
Ian




Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread Raghu Angadi


Did you try 'telnet 198.55.35.229 54310' from this datanode? The log 
show that it is not able to connect to master:54310. ssh from datanode 
does not matter.


Raghu.

asif md wrote:

I can SSH both ways .i.e. From master to slave and slave to master.

the datanode is getting intialized at master but the log at slave looks like
this

/
2009-06-04 15:20:06,066 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = 
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250;
compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
/
2009-06-04 15:20:08,826 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 0 time(s).
2009-06-04 15:20:09,829 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 1 time(s).
2009-06-04 15:20:10,831 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 2 time(s).
2009-06-04 15:20:11,832 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 3 time(s).
2009-06-04 15:20:12,834 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 4 time(s).
2009-06-04 15:20:13,837 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 5 time(s).
2009-06-04 15:20:14,840 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 6 time(s).
2009-06-04 15:20:15,841 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 7 time(s).
2009-06-04 15:20:16,844 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 8 time(s).
2009-06-04 15:20:17,847 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/198.55.35.229:54310. Already tried 9 time(s).
2009-06-04 15:20:17,873 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: Call to master/198.55.35.229:54310 failed on local
exception: java.net.NoRouteToHostException: No route to host
at org.apache.hadoop.ipc.Client.wrapException(Client.java:751)
at org.apache.hadoop.ipc.Client.call(Client.java:719)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:335)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:372)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:309)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:286)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:277)
at org.apache.hadoop.dfs.DataNode.init(DataNode.java:223)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3071)
at
org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:3026)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:3034)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3156)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:301)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:820)
at org.apache.hadoop.ipc.Client.call(Client.java:705)
... 13 more

2009-06-04 15:20:17,874 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at ***
**88

Please suggest.

Asif.


On Thu, Jun 4, 2009 at 4:15 PM, asif md asif.d...@gmail.com wrote:


@Ravi

thanx ravi .. i'm now using my a definded tmp dir so the second issue is
resolved.

But i have ssh keys tht have passwords. But i am able to ssh to the slave
and master from the master.

should i be able to do tht from the slave as well.

@ALL

Any suggestions.

Thanx

Asif.


On Thu, Jun 4, 2009 at 3:17 PM, Ravi Phulari rphul...@yahoo-inc.comwrote:


 From logs looks like your Hadoop cluster is facing two different issues
.

At Slave

   1. exception: java.net.NoRouteToHostException: No route to host in
   your logs

Diagnosis - One of your nodes cannot be reached correctly. Make sure you
can ssh to your master and 

Re: Renaming all nodes in Hadoop cluster

2009-06-03 Thread Raghu Angadi


Renaming datanodes should not affect HDFS. HDFS does not depend on 
hostname or ip for consistency of data. You can try renaming a few of 
the nodes.


Of course, if you rename NameNode, you need to update the config file to 
reflect that.


Stuart White wrote:

Is it possible to rename all nodes in a Hadoop cluster and not lose
the data stored on hdfs?  Of course I'll need to update the master
and slaves files, but I'm not familiar with how hdfs tracks where it
has written all the splits of the files.  Is it possible to retain the
data written to hdfs when renaming all nodes in the cluster, and if
so, what additional configuration changes, if any, are required?




Re: Question on HDFS write performance

2009-06-01 Thread Raghu Angadi


Can you post the patch for these measurements? I can guess where these 
are measured but better to see the actual changes.


For. e.g. the third datanode does only two things : receiving and 
writing data to the disk. So avg block writing time for you should be 
around sum of these two (~6-7k) but it is much larger (CRC verification 
should not affect much). Not sure why that is the case.


How many simultaneous maps are you running?

Looking at the first datanode stats, the fact that it spends a lot more 
time writing to disk compared to receiving, you are mostly harddisk bound.


Raghu.

Martin Mituzas wrote:

I need to indentify the bottleneck of my current cluster when running
io-bound benchmarks.
I run a test with 4 nodes,  1 node as job tracker and namenode, 3 nodes as
task tracker and data nodes. 
I run RandomWriter to generate 30G data with 15 mappers, and then run Sort

on the generated data with 15 reducers. Replication is 3.
I add log code into HDFS code and then analyze the generated log for
randomwriter period and sort period. 
The result is as follows. 
I measured the following values:

1) average block preparation time: the time DFSClient spent to generate all
packets for a block. 
2) average block writing time: from the time DFSClient gets an allocated

block from namenode in nextBlockOutputStream() to all Acks are received.
3) average network receiving time and average disk writing time for a block
for the first, second, third datanode in the pipeline.

RandomWriter
Total 528 blocks,  total size is 34063336366, full blocks(64M): 506
Average block preparation time by client is: 11456.21
Average writing time for one block(64M): 11931.49
Average time on No.0  target datanode:
  average network receiving time :112.44
  average disk writing time  :3035.04
Average time on No.1  target datanode:
  average network receiving time :3337.68
  average disk writing time  :2950.74
Average time on No.2  target datanode:
  average network receiving time :3171.18
  average disk writing time  :2646.38
 


sort
Total 494 blocks,  total size is 32318504139, full blocks(64M): 479
Average block preparation time by client is: 16237.59
Average writing time for one block(64M): 16642.67
Average time on No.0  target datanode:
  average network receiving time :164.28
  average disk writing time  :3331.50
Average time on No.1  target datanode:
  average network receiving time :2125.62
  average disk writing time  :3436.32
Average time on No.2  target datanode:
  average network receiving time :2856.56
  average disk writing time  :3426.04

And my question is why the network receiving time on the third node is
larger than the other two nodes. Another question, how to identify the
bottlenecks? Or You can tell me what other kinds of values should be
collected. 


Thanks in advance.




Re: InputStream.open() efficiency

2009-05-26 Thread Raghu Angadi

Stas Oskin wrote:

Hi.

Thanks for the answer.

Would up to 5 minute of handlers cause any issues?


5 min should not cause any issues..


And same about writing?


writing is not affected by the couple of issues I mentioned. Writing 
over a long time should work as well as writing over shorter time.


Raghu.


Regards.

2009/5/26 Raghu Angadi rang...@yahoo-inc.com


'in.seek(); in.read()' is certainly better than,
'in = fs.open(); in.seek(); in.read()'

The difference is is exactly one open() call. So you would save an RPC to
NameNode.

There are couple of issues that affect apps that keep the handlers open
very long time (many hours to days).. but those will be fixed soon.

Raghu.


Stas Oskin wrote:


Hi.

I'm looking to find out, how the InputStream.open() + skip(), compares to
keeping a handle of InputStream() and just seeking the position.

Has anyone compared these approaches, and can advice on their speed?

Regards.








Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
As hack, you could tunnel NN traffic from GridFTP clients through a 
different machine (by changing fs.default.name). Alternately these 
clients could use a socks proxy.


The amount of traffic to NN is not much and tunneling should not affect 
performance.


Raghu.

Brian Bockelman wrote:

Hey all,

Had a problem I wanted to ask advice on.  The Caltech site I work with 
currently have a few GridFTP servers which are on the same physical 
machines as the Hadoop datanodes, and a few that aren't.  The GridFTP 
server has a libhdfs backend which writes incoming network data into HDFS.


They've found that the GridFTP servers which are co-located with HDFS 
datanode have poor performance because data is incoming at a much faster 
rate than the HDD can handle.  The standalone GridFTP servers, however, 
push data out to multiple nodes at one, and can handle the incoming data 
just fine (200MB/s).


Is there any way to turn off the preference for the local node?  Can 
anyone think of a good workaround to trick HDFS into thinking the client 
isn't on the same node?


Brian




Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi


I think you should file a jira on this. Most likely this is what is 
happening :


 * two out of 3 dns can not take anymore blocks.
 * While picking nodes for a new block, NN mostly skips the third dn as 
well since '# active writes' on it is larger than '2 * avg'.
 * Even if there is one other block is being written on the 3rd, it is 
still greater than (2 * 1/3).


To test this, if you write just one block to an idle cluster it should 
succeed.


Writing from the client on the 3rd dn succeeds since local node is 
always favored.


This particular problem is not that severe on a large cluster but HDFS 
should do the sensible thing.


Raghu.

Stas Oskin wrote:

Hi.

I'm testing Hadoop in our lab, and started getting the following message
when trying to copy a file:
Could only be replicated to 0 nodes, instead of 1

I have the following setup:

* 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the 1.5GB
machine)
* The replication is set on 2
* I let the space on 2 smaller machines to end, to test the behavior

Now, one of the clients (the one located on 1.5GB) works fine, and the other
one - the external, unable to copy and displays the error + the exception
below

Any idea if this expected on my scenario? Or how it can be solved?

Thanks in advance.



09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
/test/test.bin retries left 1

09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.bin could only be replicated to 0 nodes, instead of 1

at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
)

at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



at org.apache.hadoop.ipc.Client.call(Client.java:716)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

at java.lang.reflect.Method.invoke(Method.java:597)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)



09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
datanode[0]

java.io.IOException: Could not get block locations. Aborting...

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
)

at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
)





Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi

Brian Bockelman wrote:


On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:



I think you should file a jira on this. Most likely this is what is 
happening :


* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn as 
well since '# active writes' on it is larger than '2 * avg'.
* Even if there is one other block is being written on the 3rd, it is 
still greater than (2 * 1/3).


To test this, if you write just one block to an idle cluster it should 
succeed.


Writing from the client on the 3rd dn succeeds since local node is 
always favored.


This particular problem is not that severe on a large cluster but HDFS 
should do the sensible thing.




Hey Raghu,

If this analysis is right, I would add it can happen even on large 
clusters!  I've seen this error at our cluster when we're very full 
(97%) and very few nodes have any empty space.  This usually happens 
because we have two very large nodes (10x bigger than the rest of the 
cluster), and HDFS tends to distribute writes randomly -- meaning the 
smaller nodes fill up quickly, until the balancer can catch up.


Yes. This would bite when ever a large portion of nodes can not accept 
blocks. In general can happen whenever less than half the nodes have any 
space left.


Raghu.



Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi

Stas Oskin wrote:

I think you should file a jira on this. Most likely this is what is
happening :



Here it is - hope it's ok:

https://issues.apache.org/jira/browse/HADOOP-5886


looks good. I will add my earlier post as comment. You could update the 
jira with any more tests.


Next time, it would be better include larger stack traces, logs etc in 
subsequent comments rather than in the description.


Thanks,
Raghu.


Re: How to replace the storage on a datanode without formatting the namenode?

2009-05-15 Thread Raghu Angadi

jason hadoop wrote:


Raghu, your technique will only work well if you can complete steps 1-4 in
less than the datanode timeout interval, which may be valid for Alexandria.
I believe the timeout is 10 minutes.


Correct I should have mentioned it as well. the process should finish in 
10 minutes. Alternately user could stop NameNode or the entire cluster 
during this time or swap steps (2) and (3)


Raghu.


If you pass the timeout interval the namenode will start to rebalance the
blocks, and when the datanode comes back it will delete all of the blocks it
has rebalanced.

On Thu, May 14, 2009 at 11:35 AM, Raghu Angadi rang...@yahoo-inc.comwrote:


Along these lines, even simpler approach I would think is :

1) set data.dir to local and create the data.
2) stop the datanode
3) rsync local_dir network_dir
4) start datanode with data.dir with network_dir

There is no need to format or rebalnace.

This way you can switch between local and network multiple times (without
needing to rsync data, if there are no changes made in the tests)

Raghu.


Alexandra Alecu wrote:


Another possibility I am thinking about now, which is suitable for me as I
do
not actually have much data stored in the cluster when I want to perform
this switch is to set the replication level really high and then simply
remove the local storage locations and restart the cluster. With a bit of
luck the high level of replication will allow a full recovery of the
cluster
on restart.

Is this something that you would advice?

Many thanks,
Alexandra.










Re: How to replace the storage on a datanode without formatting the namenode?

2009-05-14 Thread Raghu Angadi


Along these lines, even simpler approach I would think is :

1) set data.dir to local and create the data.
2) stop the datanode
3) rsync local_dir network_dir
4) start datanode with data.dir with network_dir

There is no need to format or rebalnace.

This way you can switch between local and network multiple times 
(without needing to rsync data, if there are no changes made in the tests)


Raghu.

Alexandra Alecu wrote:

Another possibility I am thinking about now, which is suitable for me as I do
not actually have much data stored in the cluster when I want to perform
this switch is to set the replication level really high and then simply
remove the local storage locations and restart the cluster. With a bit of
luck the high level of replication will allow a full recovery of the cluster
on restart.

Is this something that you would advice?

Many thanks,
Alexandra.




Re: public IP for datanode on EC2

2009-05-14 Thread Raghu Angadi

Philip Zeyliger wrote:


You could use ssh to set up a SOCKS proxy between your machine and
ec2, and setup org.apache.hadoop.net.SocksSocketFactory to be the
socket factory.
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
has more information.


very useful write up. Regd the problem with reverse DNS mentioned (thats 
why you had to add a DNS record for internal ip) it is fixed in 
https://issues.apache.org/jira/browse/HADOOP-5191 (for HDFS access 
least). Some mapred parts are still affected (HADOOP-5610). Depending on 
reverse DNS should avoided.


Ideally setting fs.default.name to internal ip should just work for 
clients.. both internally and externally (through proxies).


Raghu.


Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-10 Thread Raghu Angadi


It should not be waiting unnecessarily. But the client has to, if any of 
the datanodes in the pipeline is not able to receive the as fast as 
client is writing. IOW writing goes as fast as the slowest of nodes 
involved in the pipeline (1 client and 3 datanodes).


But based on what your case is, you probably could benefit by increasing 
the buffer (number of unacked packets).. it would depend on where the 
datastream thread is blocked.


Raghu.

stack wrote:

Writing a file, our application spends a load of time here:

at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964)
- locked 0x7f11054c2b68 (a java.util.LinkedList)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked 0x7f1105694f28 (a
org.apache.hadoop.fs.FSDataOutputStream)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1020)
- locked 0x7f1105694e98 (a
org.apache.hadoop.io.SequenceFile$Writer)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:984)

Here is the code from around line 2964 in writeChunk.

// If queue is full, then wait till we can create  enough
space
while (!closed  dataQueue.size() + ackQueue.size()   maxPackets)
{
  try
{

dataQueue.wait();
  } catch (InterruptedException  e) {

}

}

The queue of packets is full and we're waiting for it to be cleared.

Any suggestions for how I might get the DataStreamer to act more promptly
clearing the package queue?

This is hadoop 0.20 branch.  Its a small cluster but relatively lightly
loaded (so says ganglia).

Thanks,
St.Ack





Re: Huge DataNode Virtual Memory Usage

2009-05-10 Thread Raghu Angadi

what do 'jmap' and 'jmap -histo:live' show?.

Raghu.

Stefan Will wrote:

Chris,

Thanks for the tip ... However I'm already running 1.6_10:

java version 1.6.0_10
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

Do you know of a specific bug # in the JDK bug database that addresses this
?

Cheers,
Stefan



From: Chris Collins ch...@scoutlabs.com
Reply-To: core-user@hadoop.apache.org
Date: Fri, 8 May 2009 20:34:21 -0700
To: core-user@hadoop.apache.org core-user@hadoop.apache.org
Subject: Re: Huge DataNode Virtual Memory Usage

Stefan, there was a nasty memory leak in in 1.6.x before 1.6 10.  It
manifested itself during major GC.  We saw this on linux and solaris
and dramatically improved with an upgrade.

C
On May 8, 2009, at 6:12 PM, Stefan Will wrote:


Hi,

I just ran into something rather scary: One of my datanode processes
that
I¹m running with ­Xmx256M, and a maximum number of Xceiver threads
of 4095
had a virtual memory size of over 7GB (!). I know that the VM size
on Linux
isn¹t necessarily equal to the actual memory used, but I wouldn¹t
expect it
to be an order of magnitude higher either. I ran pmap on the
process, and it
showed around 1000 thread stack blocks with roughly 1MB each (which
is the
default size on the 64bit JDK). The largest block was 3GB in size
which I
can¹t figure out what it is for.

Does anyone have any insights into this ? Anything that can be done to
prevent this other than to restart the DFS regularly ?

-- Stefan







Re: Is HDFS protocol written from scratch?

2009-05-07 Thread Raghu Angadi



Philip Zeyliger wrote:

It's over TCP/IP, in a custom protocol.  See DataXceiver.java.  My sense is
that it's a custom protocol because Hadoop's IPC mechanism isn't optimized
for large messages.


yes, and job classes are not distributed using this. It is a very simple 
protocol used to read and write raw data to DataNodes.



-- Philip

On Thu, May 7, 2009 at 9:11 AM, Foss User foss...@gmail.com wrote:


I understand that the blocks are transferred between various nodes
using HDFS protocol. I believe, even the job classes are distributed
as files using the same HDFS protocol.

Is this protocol written over TCP/IP from scratch or this is a
protocol that works on top of some other protocol like HTTP, etc.?







Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-07 Thread Raghu Angadi

Albert Sunwoo wrote:

Thanks for the info!

I was hoping to get some more specific information though. 


in short : we need to more info.

There are typically 4 machines/processes involved in a write : the 
client and 3 datanodes writing the replicas. To see what really 
happened, you need to provide error message(s) for this block on these 
other parts (at least on 3 datanodes should be useful).


This particular error just implies this datanode is the 2nd of the 3 
datanodes (assuming replication of 3) in the write pipeline and its 
connection from the 1st datanode was closed. To deduce more we need more 
info... starting with what happened to that block on the first datanode.


also the 3rd datanode is 10.102.0.106, the block you should grep for in 
other logs is blk_-7056150840276493498 etc..


You should try to see what could be useful information for others 
diagnose the problem... more than likely you will find the cause 
yourself in the process.


Raghu.


We are seeing these occur during every run, and as such it's not leaving some 
folks in our organization with a good feeling about the reliability of HDFS.
Do these occur as a result of resources being unavailable?  Perhaps the nodes 
are too busy and can no longer service reads from other nodes?  Or if the jobs 
are causing too much network traffic?  At first glance the machines do not 
seemed to be pinned, however I am wondering if sudden bursts of jobs can be 
causing these as well.  If so does anyone have configuration recommendations to 
minimize or remove these errors under any of these circumstances, or perhaps 
there is another explanation?

Thanks,
Albert

On 5/5/09 11:34 AM, Raghu Angadi rang...@yahoo-inc.com wrote:



This can happen for example when a client is killed when it has some
files open for write. In that case it is an expected error (the log
should really be at WARN or INFO level).

Raghu.

Albert Sunwoo wrote:

Hello Everyone,

I know there's been some chatter about this before but I am seeing the errors 
below on just about every one of our nodes.  Is there a definitive reason on 
why these are occuring, is there something that we can do to prevent these?

2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.102.0.105:50010, 
storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)

Followed by:
2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7056150840276493498_10885 1 Exception 
java.io.InterruptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.Socke
tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 
millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Thread.java:619)

Thanks,
Albert









Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-06 Thread Raghu Angadi

Tamir Kamara wrote:

Hi Raghu,

The thread you posted is my original post written when this problem first
happened on my cluster. I can file a JIRA but I wouldn't be able to provide
information other than what I already posted and I don't have the logs from
that time. Should I still file ?


yes. Jira is a better place for tracking and fixing bugs. I am pretty 
sure what you saw is a bug (either already or needs to be fixed).


Raghu.


Thanks,
Tamir


On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi rang...@yahoo-inc.com wrote:


Tamir,

Please file a jira on the problem you are seeing with 'saveLeases'. In the
past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724,
and more mentioned in HADOOP-3724).

Also refer the thread you started
http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html

I think another user reported the same problem recently.

These are indeed very serious and very annoying bugs.

Raghu.


Tamir Kamara wrote:


I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path

/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara tamirkam...@gmail.com

 I had the same problem a couple of weeks ago with 0.19.1. Had to

reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin stas.os...@gmail.com
wrote:

 Hi.

After rebooting the NameNode server, I found out the NameNode doesn't


start


anymore.

The logs contained this error:
FSNamesystem initialization failed


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run out


of


space. But, both the NameNode and the SecondaryNameNode directories


were
on


another mount point with plenty of space there - so it's very strange


that


they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,


could
not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including manually
erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the


future?


I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!








Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-05 Thread Raghu Angadi


Stas,

This is indeed a serious issue.

Did you happen to store the the corrupt image? Can this be reproduced 
using the image?


Usually you can recover manually from a corrupt or truncated image. But 
more importantly we want to find how it got in to this state.


Raghu.

Stas Oskin wrote:

Hi.

This quite worry-some issue.

Can anyone advice on this? I'm really concerned it could appear in
production, and cause a huge data loss.

Is there any way to recover from this?

Regards.

2009/5/5 Tamir Kamara tamirkam...@gmail.com


I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path

/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin stas.os...@gmail.com wrote:


Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara tamirkam...@gmail.com


I had the same problem a couple of weeks ago with 0.19.1. Had to

reformat

the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin stas.os...@gmail.com

wrote:

Hi.

After rebooting the NameNode server, I found out the NameNode doesn't

start

anymore.

The logs contained this error:
FSNamesystem initialization failed


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run

out

of

space. But, both the NameNode and the SecondaryNameNode directories

were

on

another mount point with plenty of space there - so it's very strange

that

they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,

could

not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including

manually

erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the

future?

I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!







Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-05 Thread Raghu Angadi

Tamir,

Please file a jira on the problem you are seeing with 'saveLeases'. In 
the past there have been multiple fixes in this area (HADOOP-3418, 
HADOOP-3724, and more mentioned in HADOOP-3724).


Also refer the thread you started 
http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html


I think another user reported the same problem recently.

These are indeed very serious and very annoying bugs.

Raghu.

Tamir Kamara wrote:

I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path
/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin stas.os...@gmail.com wrote:


Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara tamirkam...@gmail.com


I had the same problem a couple of weeks ago with 0.19.1. Had to reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin stas.os...@gmail.com wrote:


Hi.

After rebooting the NameNode server, I found out the NameNode doesn't

start

anymore.

The logs contained this error:
FSNamesystem initialization failed


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run out

of

space. But, both the NameNode and the SecondaryNameNode directories

were

on

another mount point with plenty of space there - so it's very strange

that

they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,

could

not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including manually
erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the

future?

I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!







Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-05 Thread Raghu Angadi


This can happen for example when a client is killed when it has some 
files open for write. In that case it is an expected error (the log 
should really be at WARN or INFO level).


Raghu.

Albert Sunwoo wrote:

Hello Everyone,

I know there's been some chatter about this before but I am seeing the errors 
below on just about every one of our nodes.  Is there a definitive reason on 
why these are occuring, is there something that we can do to prevent these?

2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.102.0.105:50010, 
storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)

Followed by:
2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7056150840276493498_10885 1 Exception 
java.io.InterruptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.Socke
tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 
millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Thread.java:619)

Thanks,
Albert





Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-05 Thread Raghu Angadi

Stas Oskin wrote:

Actually, we discovered today an annoying bug in our test-app, which might
have moved some of the HDFS files to the cluster, including the metadata
files.


oops! presumably it could have removed the image file itself.


I presume it could be the possible reason for such behavior? :)


certainly. It could lead to many different failures. If you had stack 
trace of the exception, it would be more clear what the error was this time.


Raghu.


2009/5/5 Stas Oskin stas.os...@gmail.com


Hi Raghu.

The only lead I have, is that my root mount has filled-up completely.

This in itself should not have caused the metadata corruption, as it has
been stored on another mount point, which had plenty of space.

But perhaps the fact that NameNode/SecNameNode didn't have enough space for
logs has caused this?

Unfortunately I was pressed in time to get the cluster up and running, and
didn't preserve the logs or the image.
If this happens again - I will surely do so.

Regards.

2009/5/5 Raghu Angadi rang...@yahoo-inc.com



Stas,

This is indeed a serious issue.

Did you happen to store the the corrupt image? Can this be reproduced
using the image?

Usually you can recover manually from a corrupt or truncated image. But
more importantly we want to find how it got in to this state.

Raghu.


Stas Oskin wrote:


Hi.

This quite worry-some issue.

Can anyone advice on this? I'm really concerned it could appear in
production, and cause a huge data loss.

Is there any way to recover from this?

Regards.

2009/5/5 Tamir Kamara tamirkam...@gmail.com

 I didn't have a space problem which led to it (I think). The corruption

started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but
didn't
find anything useful in the logs besides this line:
saveLeases found path


/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_02_0/part-2
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin stas.os...@gmail.com
wrote:

 Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara tamirkam...@gmail.com

 I had the same problem a couple of weeks ago with 0.19.1. Had to
reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin stas.os...@gmail.com


wrote:
 Hi.

After rebooting the NameNode server, I found out the NameNode doesn't


start


anymore.

The logs contained this error:
FSNamesystem initialization failed


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run


out

of

space. But, both the NameNode and the SecondaryNameNode directories


were
on


another mount point with plenty of space there - so it's very strange


that


they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,


could
not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including


manually

 erasing the files from DataNodes). While this reasonable in test

environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the


future?


I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!








Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-05 Thread Raghu Angadi


the image is stored in two files : fsimage and edits
(under namenode-directory/current/).

Stas Oskin wrote:


Well, it definitely caused the SecondaryNameNode to crash, and also seems to
have triggered some strange issues today as well.
By the way, how the image file is named?





Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi


There is some mismatch here.. what is the expected ip address of this 
machine (or does it have multiple interfaces and properly routed)? 
Looking at the Receiving Block message DN thinks its address is 
192.168.253.20 but NN thinks it is 253.32 (and client is able to connect 
 using 253.32).


If you want to find the destination ip that this DN is unable to connect 
to, you can check client's log for this block number.


Stas Oskin wrote:

Hi.


2009/4/22 jason hadoop jason.had...@gmail.com


Most likely that machine is affected by some firewall somewhere that
prevents traffic on port 50075. The no route to host is a strong indicator,
particularly if the Datanote registered with the namenode.




Yes, this was my first thought as well. But there is no firewall, and the
port can be connected via netcat from any other machine.

Any other idea?

Thanks.





Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi

Stas Oskin wrote:


 Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.


Shouldn't you be testing connecting _from_ the datanode? The error you 
posted is while this DN is trying connect to another DN.


Raghu.


I agree there is indeed an interesting problem :). Question is how it can be
solved.

Thanks.





Re: More Replication on dfs

2009-04-14 Thread Raghu Angadi

Aseem,

Regd over-replication, it is mostly app related issue as Alex mentioned.

But if you are concerned about under-replicated blocks in fsck output :

These blocks should not stay under-replicated if you have enough nodes 
and enough space on them (check NameNode webui).


Try grep-ing for one of the blocks in NameNode log (and datnode logs as 
well, since you have just 3 nodes).


Raghu.

Puri, Aseem wrote:

Alex,

Ouput of $ bin/hadoop fsck / command after running HBase data insert
command in a table is:

.
.
.
.
.
/hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
blk_-5193
695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
replicated
blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
replicated
 blk_3934493034551838567_3132. Target Replicas is 3 but found 1
replica(s).
.
/user/HadoopAdmin/hbase table.doc:  Under replicated
blk_4339521803948458144_103
1. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/bin.doc:  Under replicated
blk_-3661765932004150973_1030
. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file01.txt:  Under replicated
blk_2744169131466786624_10
01. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file02.txt:  Under replicated
blk_2021956984317789924_10
02. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/test.txt:  Under replicated
blk_-3062256167060082648_100
4. Target Replicas is 3 but found 2 replica(s).
...
/user/HadoopAdmin/output/part-0:  Under replicated
blk_8908973033976428484_1
010. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
 Total size:48510226 B
 Total dirs:492
 Total files:   439 (Files currently being written: 2)
 Total blocks (validated):  401 (avg. block size 120973 B) (Total
open file
blocks (not validated): 2)
 Minimally replicated blocks:   401 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   399 (99.50124 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 1.3117207
 Corrupt blocks:0
 Missing replicas:  675 (128.327 %)
 Number of data-nodes:  2
 Number of racks:   1


The filesystem under path '/' is HEALTHY
Please tell what is wrong.

Aseem

-Original Message-
From: Alex Loddengaard [mailto:a...@cloudera.com] 
Sent: Friday, April 10, 2009 11:04 PM

To: core-user@hadoop.apache.org
Subject: Re: More Replication on dfs

Aseem,

How are you verifying that blocks are not being replicated?  Have you
ran
fsck?  *bin/hadoop fsck /*

I'd be surprised if replication really wasn't happening.  Can you run
fsck
and pay attention to Under-replicated blocks and Mis-replicated
blocks?
In fact, can you just copy-paste the output of fsck?

Alex

On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
aseem.p...@honeywell.comwrote:


Hi
   I also tried the command $ bin/hadoop balancer. But still the
same problem.

Aseem

-Original Message-
From: Puri, Aseem [mailto:aseem.p...@honeywell.com]
Sent: Friday, April 10, 2009 11:18 AM
To: core-user@hadoop.apache.org
Subject: RE: More Replication on dfs

Hi Alex,

   Thanks for sharing your knowledge. Till now I have three
machines and I have to check the behavior of Hadoop so I want
replication factor should be 2. I started my Hadoop server with
replication factor 3. After that I upload 3 files to implement word
count program. But as my all files are stored on one machine and
replicated to other datanodes also, so my map reduce program takes

input

from one Datanode only. I want my files to be on different data node

so

to check functionality of map reduce properly.

   Also before starting my Hadoop server again with replication
factor 2 I formatted all Datanodes and deleted all old data manually.

Please suggest what I should do now.

Regards,
Aseem Puri


-Original Message-
From: Mithila Nagendra [mailto:mnage...@asu.edu]
Sent: Friday, April 10, 2009 10:56 AM
To: core-user@hadoop.apache.org
Subject: Re: More Replication on dfs

To add to the question, how does one decide what is the optimal
replication
factor for a cluster. For instance what would be the appropriate
replication
factor for a cluster consisting of 5 nodes.
Mithila

On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard a...@cloudera.com
wrote:


Did you load any files when replication was set to 3?  If so, you'll

have

to
rebalance:



http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance

r




http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc

er

Note that most people run HDFS with a replication factor of 3.

There

have

been cases when clusters running with a replication of 2 discovered

new

bugs, because replication 

Re: DataXceiver Errors in 0.19.1

2009-04-13 Thread Raghu Angadi


It need not be anything to worry about. Do you see anything at user 
level (task, job, copy, or script) fail because of this?


On a distributed system with many nodes, there would be some errors on 
some of the nodes for various reasons (load, hardware, reboot, etc). 
HDFS usually should work around it (because of multiple replicas).


In this particular case, client is trying to write some data and one of 
the DataNodes writing a replica might have gone down. HDFS should 
recover from it and write to rest of the nodes. Please check if the 
write actually succeeded.


Raghu.

Tamir Kamara wrote:

Hi,

I've recently upgraded to 0.19.1 and now there're some DataXceiver errors in
the datanodes logs. There're also messages about interruption while waiting
for IO. Both messages are below.
Can I do something to fix it ?

Thanks,
Tamir


2009-04-13 09:57:20,334 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.14.3:50010,
storageID=DS-727246419-127.0.0.1-50010-1234873914501, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Unknown Source)


2009-04-13 09:57:20,333 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_8486030874928774495_54856 1 Exception
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[connected
local=/192.168.14.3:50439 remote=/192.168.14.7:50010]. 58972 millis
timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Unknown Source)





Re: Changing block size of hadoop

2009-04-12 Thread Raghu Angadi

Aaron Kimball wrote:

Blocks already written to HDFS will remain their current size. Blocks are
immutable objects. That procedure would set the size used for all
subsequently-written blocks. I don't think you can change the block size
while the cluster is running, because that would require the NameNode and
DataNodes to re-read their configurations, which they only do at startup.
- Aaron


Block size is a client side configuration. NameNode and DataNode don't 
need to restart.


In this particular case, even if client's config is changed, MR may not 
use the new config for partial set of maps or reducers.


Raghu.


On Sun, Apr 12, 2009 at 6:08 AM, Rakhi Khatwani rakhi.khatw...@gmail.comwrote:


Hi,
 I would like to know if it is feasbile to change the blocksize of Hadoop
while map reduce jobs are executing?  and if not would the following work?
 1.
stop map-reduce  2. stop-hbase  3. stop hadoop  4. change hadoop-sites.xml
to reduce the blocksize  5. restart all
 whether the data in the hbase tables will be safe and automatically split
after changing the block size of hadoop??

Thanks,
Raakhi







Re: swap hard drives between datanodes

2009-03-31 Thread Raghu Angadi


IP Adress mismatch should not matter. What is the actual error you saw? 
The mismatch might be unintentional.


Raghu.

Mike Andrews wrote:

i tried swapping two hot-swap sata drives between two nodes in a
cluster, but it didn't work: after restart, one of the datanodes shut
down since namenode said it reported a block belonging to another
node, which i guess namenode thinks is a fatal error. is this caused
by the hadoop/datanode/current/VERSION file having the IP address and
other ID information of the datanode hard-coded? it'd be great to be
able to do a manual gross cluster rebalance by just physically
swapping hard drives, but seems like this is not possible in the
current version 0.18.3.





Re: swap hard drives between datanodes

2009-03-31 Thread Raghu Angadi

Raghu Angadi wrote:


IP Adress mismatch should not matter. What is the actual error you saw? 
The mismatch might be unintentional.


The reason I say ip address should not matter is that if you change the 
ip address of a datanode, it should still work correctly.


Raghu.


Raghu.

Mike Andrews wrote:

i tried swapping two hot-swap sata drives between two nodes in a
cluster, but it didn't work: after restart, one of the datanodes shut
down since namenode said it reported a block belonging to another
node, which i guess namenode thinks is a fatal error. is this caused
by the hadoop/datanode/current/VERSION file having the IP address and
other ID information of the datanode hard-coded? it'd be great to be
able to do a manual gross cluster rebalance by just physically
swapping hard drives, but seems like this is not possible in the
current version 0.18.3.







Re: Socket closed Exception

2009-03-30 Thread Raghu Angadi


If it is NameNode, then there is probably a log about closing the socket 
around that time.


Raghu.

lohit wrote:

Recently we are seeing lot of Socket closed exception in our cluster. Many 
task's open/create/getFileInfo calls get back 'SocketException' with message 
'Socket closed'. We seem to see many tasks fail with same error around same 
time. There are no warning or info messages in NameNode /TaskTracker/Task logs. 
(This is on HDFS 0.15) Are there cases where NameNode closes socket due heavy 
load or during conention of resource of anykind?

Thanks,
Lohit





Re: about dfsadmin -report

2009-03-27 Thread Raghu Angadi

stchu wrote:

But when the web-ui shows the node dead, -report still shows in service
and the living nodes=3 (in web-ui: living=2 dead=1).


please file a jira and describe how to reproduce in as much detail as 
you can in a comment.


thanks,
Raghu.


stchu

2009/3/26 Raghu Angadi rang...@yahoo-inc.com


stchu wrote:


Hi,

I do a test about the datanode crash. I stop the networking on one of the
datanode.
The Web app and fsck report that datanode dead after 10 mins. But dfsadmin
-report
are not report that over 25 mins. Is this correct?


Nope. Both web-ui and '-report' from the same source of info. They should
be consistent.

Raghu.

 Thanks for your guide.

stchu








Re: about dfsadmin -report

2009-03-25 Thread Raghu Angadi

stchu wrote:

Hi,

I do a test about the datanode crash. I stop the networking on one of the
datanode.
The Web app and fsck report that datanode dead after 10 mins. But dfsadmin
-report
are not report that over 25 mins. Is this correct?


Nope. Both web-ui and '-report' from the same source of info. They 
should be consistent.


Raghu.

Thanks for your guide.

stchu





Re: hadoop need help please suggest

2009-03-24 Thread Raghu Angadi


What is scale you are thinking of? (10s, 100s or more nodes)?

The memory for metadata at NameNode you mentioned is that main issue 
with small files. There are multiple alternatives for the dealing with 
that. This issue is discussed many times here.


Also please use core-user@ id alone for asking for help.. you don't need 
to send to core-devel@


Raghu.

snehal nagmote wrote:

Hello Sir,

I have some doubts, please help me.
we have requirement of scalable storage system, we have developed one
agro-advisory system in which farmers will sent the crop pictures
particularly in sequential manner some
6-7 photos of 3-4 kb each would be stored in storage server and these photos
would be read sequentially by scientist to detect the problem, writing to
images would not be done.

So for storing these images we  are using hadoop file system, is it feasible
to use hadoop
file system for the same purpose.

As also the images are of only 3-4 kb and hadoop reads the data in blocks of
size 64 mb
how can we increase the performance, what could be the tricks and tweaks
that should be done to use hadoop for such kind of purpose.

Next problem is as hadoop stores all the metadata in memory,can we use some
mechanism to store the files in the block of some greater size because as
the files would be of small size,so it will store the lots metadata and will
overflow the main memory
please suggest what could be done


regards,
Snehal





Re: hadoop configuration problem hostname-ip address

2009-03-23 Thread Raghu Angadi

you need https://issues.apache.org/jira/browse/HADOOP-5191

I don't why there is no response to the simple patch I attached.

alternately you could use hostname that it expects instead of ip address.

Raghu.
snehal nagmote wrote:

Hi,
I am using Hadoop version 0.19. I set up a hadoop cluster  for testing
purpose with 1 master
and one slave. I set up keyless ssh between the master node and all the
slave nodes as
well as modified the /etc/hosts/ on all nodes so hostname lookup works

 I am  using ip addresses for the namenode and datanode  rather tahn
hostname as with hostname ,
datanode was not coming up

But when i try to execute any job or program, it gives me following
exception

following exceptions:
FAILED
Error initializing
*java*.*lang*.*IllegalArgumentException*: *Wrong* *FS*:
hdfs://172.16.6.102:21011/user/root/test
expected: hdfs://namnodemc:21011

Can you please help me out , from where it is taking hostname



Regards,
Snehal Nagmote
IIIT-H





Re: DataNode stops cleaning disk?

2009-03-17 Thread Raghu Angadi

Igor,

A few things you could do (may be better to file a jira, with a short 
description and more details in follow up comments) :


1) Pick one of block ids from the open files and grep for it in DataNode 
and NameNode logs (there is one log file for each day)
2) Pick one of the over-replicated blocks (if above block is not one of 
them) and trace it in NameNode log

3) take jstack of the datanode in this state.

Since you still have over-replicated blocks, you probably has more 
datanodes in early stages of this problem.


Igor Bolotin wrote:

Caught this issue again on one of the clusters. DF and DU sizes match
very closely with information reported by dfsadmin command. 


If DN reports the space properly, then the original problem you reported 
that DataNode runs out of disk should not happen.


You can follow up with more investigation if you are interested. Other 
alternative is to upgrade to latest 0.19.x to see if the problem 
persists. There have been many fixes since 0.19.0.


Raghu.


Lsof reports some 1000 open files in DFS data directories on the
problematic datanode, but total size for open files is only about 10GB.
I can't really track the space usage to individual files - there are way
too many files/blocks for detailed analysis.

Here is something interesting - fsck before datanode restart reports
very significant number of over-replicated blocks (~10% of blocks are
over-replicated):

Status: HEALTHY
 Total size:1472758591906 B (Total open files size: 29050588133 B)

 Total dirs:58431

 Total files:   375703 (Files currently being written: 418)

 Total blocks (validated):  387205 (avg. block size 3803562 B)
(Total open file blocks (not validated): 595)
 Minimally replicated blocks:   387205 (100.0 %)


 Over-replicated blocks:38782 (10.015883 %)

 Under-replicated blocks:   0 (0.0 %)

 Mis-replicated blocks: 0 (0.0 %)

 Default replication factor:3

 Average block replication: 3.1003888

 Corrupt blocks:0

 Missing replicas:  0 (0.0 %)

 Number of data-nodes:  7

 Number of racks:   1


After datanode restart - over-replicated nodes are practically gone:

Status: HEALTHY
 Total size:1310669475298 B (Total open files size: 29535016933 B)
 Total dirs:59431
 Total files:   377177 (Files currently being written: 387)
 Total blocks (validated):  386661 (avg. block size 3389712 B)
(Total open file blocks (not validated): 607)
 Minimally replicated blocks:   386661 (100.0 %)
 Over-replicated blocks:272 (0.070345856 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0007036
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)
 Number of data-nodes:  7
 Number of racks:   1

What might be the cause for over-replication?

Best regards,
Igor


-Original Message-
From: Igor Bolotin 
Sent: Monday, March 09, 2009 2:50 PM

To: core-user@hadoop.apache.org
Subject: RE: DataNode stops cleaning disk?

My mistake about 'current' directory - that's the one that consumes all
the disk space and 'du' on that directory matches exactly with namenode
web ui reported size.
I'm waiting for the next time this happens to collect more details, but
ever since I wrote the first email - everything works perfectly well
(another application of Murphy law). 


Thanks,
Igor

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Thursday, March 05, 2009 12:06 PM

To: core-user@hadoop.apache.org
Subject: Re: DataNode stops cleaning disk?

Igor Bolotin wrote:

That's what I saw just yesterday on one of the data nodes with this
situation (will confirm also next time it happens):
- Tmp and current were either empty or almost empty last time I

checked.

- du on the entire data directory matched exactly with reported used
space in NameNode web UI and it did report that it uses some most of

the
available disk space. 
- nothing else was using disk space (actually - it's dedicated DFS

cluster).


If 'du' command (you can run in the shell) counts properly then you 
should be able to see which files are taking space.


If 'du' can't but 'df' reports very less space available, then it is 
possible (though never seen) that datanode is keeping a a lot these 
files open.. 'ls -l /proc/datanodepid/fd' lists these files. If it is 
not datanode, then check lsof to find who is holding these files.


hope this helps.
Raghu.


Thank you for help!
Igor

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Thursday, March 05, 2009 11:05 AM

To: core-user@hadoop.apache.org
Subject: Re: DataNode stops cleaning disk?


This is unexpected unless some other process is eating up space.

Couple of things to collect next time (along with log):

  - All the contents under datanode-directory/ (especially including 
'tmp

Re: Not a host:port pair when running balancer

2009-03-11 Thread Raghu Angadi

Doug Cutting wrote:

Konstantin Shvachko wrote:

Clarifying: port # is missing in your configuration, should be
property
  namefs.default.name/name
  valuehdfs://hvcwydev0601:8020/value
/property

where 8020 is your port number.


That's the work-around, but it's a bug.  One should not need to specify 
the default port number (8020).  Please file an issue in Jira.


Yes, Balancer should use NameNode.getAddress(conf) to get NameNode addrees.

Raghu.




Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi

Amandeep Khurana wrote:

My dfs.datanode.socket.write.timeout is set to 0. This had to be done to get
Hbase to work.


ah.. I see, we should fix that. Not sure how others haven't seen it till 
now. Affects only those with write.timeout set to 0 on the clients.


Since setting it to 0 itself is a work around, please change that to 
some extremely large value for now.


Raghu.



Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi rang...@yahoo-inc.comwrote:


Did you change dfs.datanode.socket.write.timeout to 5 seconds? The
exception message says so. It is extremely small.

The default is 8 minutes and is intentionally pretty high. Its purpose is
mainly to catch extremely unresponsive datanodes and other network issues.

Raghu.


Amandeep Khurana wrote:


I was trying to put a 1 gig file onto HDFS and I got the following error:

09/03/10 18:23:16 WARN hdfs.DFSClient: DataStreamer Exception:
java.net.SocketTimeoutException: 5000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/171.69.102.53:34414
remote=/
171.69.102.51:50010]
   at

org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
   at

org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
   at

org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
   at java.io.BufferedOutputStream.write(Unknown Source)
   at java.io.DataOutputStream.write(Unknown Source)
   at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

09/03/10 18:23:16 WARN hdfs.DFSClient: Error Recovery for block
blk_2971879428934911606_36678 bad datanode[0] 171.69.102.51:50010
put: All datanodes 171.69.102.51:50010 are bad. Aborting...
Exception closing file /user/amkhuran/221rawdata/1g
java.io.IOException: Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
   at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
   at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
   at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
   at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
   at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
   at

org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243)
   at org.apache.hadoop.fs.FsShell.close(FsShell.java:1842)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1856)


Whats going wrong?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz








Re: HDFS is corrupt, need to salvage the data.

2009-03-11 Thread Raghu Angadi

Mayuran,

It takes very long for a lot of iterations if we have to go through each 
debugging step, one at a time. May be a jira is a good place.


- Run fsck with blocks option.

- Check if those ids match with ids in file names found by 'find'.

- Check which directory are these files in.. and verify if that matches 
with datanode configured directory


You are saying there is nothing wrong in the log files, but does it 
imply that datanode sees those 157 missing blocks? May be you should 
post the log or verify that yourself. If DN is working correctly 
according to you, then you should not have 100% of blocks missing.


There are many possibilities, it not easy for me list the the right one 
in your case without much info or list all possible conditions.


Raghu.

Mayuran Yogarajah wrote:

Mayuran Yogarajah wrote:

Raghu Angadi wrote:
 

The block files usually don't disappear easily. Check on the datanode if
you find any files starting with blk. Also check datanode log to see
what happened there... may be use started on a different directory or
something like that.

Raghu.




There are indeed blk files:
find -name 'blk*' | wc -l
158

I didn't see anything out of the ordinary in the datanode log.

At this point is there anything I can do to recover the files? Or do I
need to reformat
the data node and load the data in again ?

thanks
  
Sorry to resend this but I didn't receive a response and wanted to know 
how to proceed.

Is it possible to recover the data at this stage? Or is it gone ?

thanks




Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi

Amandeep Khurana wrote:
What happens if you set it to 0? How is it a workaround? 


HBase needs it in pre-19.0 (related story : 
http://www.nabble.com/Datanode-Xceivers-td21372227.html). It should not 
matter if you move to 0.19.0 or newer.



And how would it
matter if I change is to a large value?


very large value like 100 years is same as setting it to 0 (for all 
practical purposes).


Raghu.




Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Mar 11, 2009 at 12:00 PM, Raghu Angadi rang...@yahoo-inc.comwrote:


Amandeep Khurana wrote:


My dfs.datanode.socket.write.timeout is set to 0. This had to be done to
get
Hbase to work.


ah.. I see, we should fix that. Not sure how others haven't seen it till
now. Affects only those with write.timeout set to 0 on the clients.

Since setting it to 0 itself is a work around, please change that to some
extremely large value for now.

Raghu.




Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi rang...@yahoo-inc.com

wrote:

 Did you change dfs.datanode.socket.write.timeout to 5 seconds? The

exception message says so. It is extremely small.

The default is 8 minutes and is intentionally pretty high. Its purpose is
mainly to catch extremely unresponsive datanodes and other network
issues.

Raghu.


Amandeep Khurana wrote:

 I was trying to put a 1 gig file onto HDFS and I got the following

error:

09/03/10 18:23:16 WARN hdfs.DFSClient: DataStreamer Exception:
java.net.SocketTimeoutException: 5000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/171.69.102.53:34414
remote=/
171.69.102.51:50010]
  at


org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
  at


org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
  at


org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
  at java.io.BufferedOutputStream.write(Unknown Source)
  at java.io.DataOutputStream.write(Unknown Source)
  at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

09/03/10 18:23:16 WARN hdfs.DFSClient: Error Recovery for block
blk_2971879428934911606_36678 bad datanode[0] 171.69.102.51:50010
put: All datanodes 171.69.102.51:50010 are bad. Aborting...
Exception closing file /user/amkhuran/221rawdata/1g
java.io.IOException: Filesystem closed
  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
  at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
  at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
  at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
  at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
  at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
  at


org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243)
  at org.apache.hadoop.fs.FsShell.close(FsShell.java:1842)
  at org.apache.hadoop.fs.FsShell.main(FsShell.java:1856)


Whats going wrong?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz









Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi

Raghu Angadi wrote:

Amandeep Khurana wrote:
My dfs.datanode.socket.write.timeout is set to 0. This had to be done 
to get

Hbase to work.


ah.. I see, we should fix that. Not sure how others haven't seen it till 
now. Affects only those with write.timeout set to 0 on the clients.


filed : https://issues.apache.org/jira/browse/HADOOP-5464


Since setting it to 0 itself is a work around, please change that to 
some extremely large value for now.


Raghu.



Re: HDFS is corrupt, need to salvage the data.

2009-03-10 Thread Raghu Angadi

Mayuran Yogarajah wrote:

lohit wrote:

How many Datanodes do you have.
From the output it looks like at the point when you ran fsck, you had 
only one datanode connected to your NameNode. Did you have others?
Also, I see that your default replication is set to 1. Can you check 
if  your datanodes are up and running.

Lohit


  
There is only one data node at the moment.  Does this mean the data is 
not recoverable?
The HD on the machine seems fine so I'm a little confused as to what 
caused the HDFS to

become corrupted.


The block files usually don't disappear easily. Check on the datanode if 
you find any files starting with blk. Also check datanode log to see 
what happened there... may be use started on a different directory or 
something like that.


Raghu.


Re: DataNode stops cleaning disk?

2009-03-05 Thread Raghu Angadi


This is unexpected unless some other process is eating up space.

Couple of things to collect next time (along with log):

 - All the contents under datanode-directory/ (especially including 
'tmp' and 'current')
 - Does 'du' of this directory match with what is reported to NameNode 
(shown on webui) by this DataNode.

 - Is there anything else taking disk space on the machine?

Raghu.

Igor Bolotin wrote:

Normally I dislike writing about problems without being able to provide
some more information, but unfortunately in this case I just can't find
anything.

 


Here is the situation - DFS cluster running Hadoop version 0.19.0. The
cluster is running on multiple servers with practically identical
hardware. Everything works perfectly well, except for one thing - from
time to time one of the data nodes (every time it's a different node)
starts to consume more and more disk space. The node keeps going and if
we don't do anything - it runs out of space completely (ignoring 20GB
reserved space settings). Once restarted - it cleans disk rapidly and
goes back to approximately the same utilization as the rest of data
nodes in the cluster.

 


Scanning datanodes and namenode logs and comparing thread dumps (stacks)
from nodes experiencing problem and those that run normally didn't
produce any clues. Running balancer tool didn't help at all. FSCK shows
that everything is healthy and number of over-replicated blocks is not
significant.

 


To me - it just looks like at some point the data node stops cleaning
invalidated/deleted blocks, but keeps reporting space consumed by these
blocks as not used, but I'm not familiar enough with the internals and
just plain don't have enough free time to start digging deeper.

 


Anyone has an idea what is wrong or what else we can do to find out
what's wrong or maybe where to start looking in the code?

 


Thanks,

Igor

 







Re: Hadoop Write Performance

2009-02-18 Thread Raghu Angadi


what is the hadoop version?

You could check log on a datanode around that time. You could post any 
suspicious errors. For e.g. you can trace a particular block in client 
and datanode logs.


Most likely it not a NameNode issue, but you can check NameNode log as well.

Raghu.

Xavier Stevens wrote:

Does anyone have an expected or experienced write speed to HDFS outside
of Map/Reduce?  Any recommendations on properties to tweak in
hadoop-site.xml?
 
Currently I have a multi-threaded writer where each thread is writing to

a different file.  But after a while I get this:
 
java.io.IOException: Could not get block locations. Aborting...

 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2081)
 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
va:1702)
 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1818)
 
Which is perhaps indicating that the namenode is overwhelmed?
 
 
Thanks,
 
-Xavier






Re: Too many open files in 0.18.3

2009-02-13 Thread Raghu Angadi

Sean,

A few things in your messages is not clear to me. Currently this is what 
I make out of it :


1) with 1k limit, you do see the problem.
2) with 16 limit - (?) not clear if you see the problem
3) with 8k you don't see the problem
 3a) with or without the patch, I don't know.

But if you do use the patch and things do improve, please let us know.

Raghu.

Sean Knapp wrote:

Raghu,
Thanks for the quick response. I've been beating up on the cluster for a
while now and so far so good. I'm still at 8k... what should I expect to
find with 16k versus 1k? The 8k didn't appear to be affecting things to
begin with.

Regards,
Sean

On Thu, Feb 12, 2009 at 2:07 PM, Raghu Angadi rang...@yahoo-inc.com wrote:


You are most likely hit by
https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back
ported. There is a 0.18 patch posted there.

btw, does 16k help in your case?

Ideally 1k should be enough (with small number of clients). Please try the
above patch with 1k limit.

Raghu.


Sean Knapp wrote:


Hi all,
I'm continually running into the Too many open files error on 18.3:

DataXceiveServer: java.io.IOException: Too many open files
   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
   at


sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)

at
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:96)

at
org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:997)

at java.lang.Thread.run(Thread.java:619)


I'm writing thousands of files in the course of a few minutes, but nothing
that seems too unreasonable, especially given the numbers below. I begin
getting a surge of these warnings right as I hit 1024 files open by the
DataNode:

had...@u10:~$ ps ux | awk '/dfs\.DataNode/ { print $2 }' | xargs -i ls


/proc/{}/fd | wc -l

 1023


This is a bit unexpected, however, since I've configured my open file
limit
to be 16k:

had...@u10:~$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 268288
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 16384
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 268288
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


Note, I've also set dfs.datanode.max.xcievers to 8192 in hadoop-site.xml.

Thanks in advance,
Sean








Re: Too many open files in 0.18.3

2009-02-13 Thread Raghu Angadi

Sean Knapp wrote:

Raghu,
Apologies for the confusion. I was seeing the problem with any setting
for dfs.datanode.max.xcievers... 1k, 2k and 8k. Likewise, I was also seeing
the problem with different open file settings, all the way up to 32k.

Since I installed the patch, HDFS has been performing much better. The
current settings that work for me are 16k max open files with
dfs.datanode.max.xcievers=8k, though under heavy balancer load I do start to
hit the 16k max.


Thanks. That clarifies many things.

Even with the patch, if you have lot of simultaneous clients, DataNode 
does need a lot of file descriptors (something like 6 times the number 
of active 'xceivers'). In your case you seem to have lot of simultaneous 
clients. I suggest increasing file limit to much higher (something like 
64k).


Raghu.


Regards,
Sean

2009/2/13 Raghu Angadi rang...@yahoo-inc.com





Re: Too many open files in 0.18.3

2009-02-12 Thread Raghu Angadi


You are most likely hit by 
https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back 
ported. There is a 0.18 patch posted there.


btw, does 16k help in your case?

Ideally 1k should be enough (with small number of clients). Please try 
the above patch with 1k limit.


Raghu.

Sean Knapp wrote:

Hi all,
I'm continually running into the Too many open files error on 18.3:

DataXceiveServer: java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at

sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)


at

sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:96)


at

org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:997)


at java.lang.Thread.run(Thread.java:619)


I'm writing thousands of files in the course of a few minutes, but nothing
that seems too unreasonable, especially given the numbers below. I begin
getting a surge of these warnings right as I hit 1024 files open by the
DataNode:

had...@u10:~$ ps ux | awk '/dfs\.DataNode/ { print $2 }' | xargs -i ls

/proc/{}/fd | wc -l


1023


This is a bit unexpected, however, since I've configured my open file limit
to be 16k:

had...@u10:~$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 268288
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 16384
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 268288
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


Note, I've also set dfs.datanode.max.xcievers to 8192 in hadoop-site.xml.

Thanks in advance,
Sean





Re: stable version

2009-02-11 Thread Raghu Angadi

Vadim Zaliva wrote:

The particular problem I am having is this one:

https://issues.apache.org/jira/browse/HADOOP-2669

I am observing it in version 19. Could anybody confirm that
it have been fixed in 18, as Jira claims?

I am wondering why bug fix for this problem might have been committed
to 18 branch but not 19. If it was commited to both, then perhaps the
problem was not completely solved and downgrading to 18 will not help
me.


If you read through the comments, it will see that the the root cause 
was never found. The patch just fixes one of the suspects. If you are 
still seeing this, please file another jira and link it HADOOP-2669.


How easy is it for you reproduce this? I guess one of the reasons for 
incomplete diagnosis is that it is not simple to reproduce.


Raghu.


Vadim

On Wed, Feb 11, 2009 at 00:48, Rasit OZDAS rasitoz...@gmail.com wrote:

Yes, version 18.3 is the most stable one. It has added patches,
without not-proven new functionality.

2009/2/11 Owen O'Malley omal...@apache.org:

On Feb 10, 2009, at 7:21 PM, Vadim Zaliva wrote:


Maybe version 0.18
is better suited for production environment?

Yahoo is mostly on 0.18.3 + some patches at this point.

-- Owen




--
M. Raşit ÖZDAŞ





Re: can't read the SequenceFile correctly

2009-02-09 Thread Raghu Angadi


+1 on something like getValidBytes(). Just the existence of this would 
warn many programmers about getBytes().


Raghu.

Owen O'Malley wrote:


On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:


Hey Tom,

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() 
kind of function.


It does it because continually resizing the array to the valid length 
is very expensive. It would be a reasonable patch to add a 
getValidBytes, but most methods in Java's libraries are aware of this 
and let you pass in byte[], offset, and length. So once you realize what 
the problem is, you can work around it.


-- Owen




Re: Connect to namenode

2009-02-05 Thread Raghu Angadi


I don't think it is intentional. Please file a jira with all the details 
about how to reproduce (with actual configuration files).


thanks,
Raghu.

Habermaas, William wrote:

After creation and startup of the hadoop namenode, you can only connect
to the namenode via hostname and not IP.

 


EX. hostname for box is sunsystem07, ip is 10.120.16.99

 


If you use the following url, hdfs://10.120.16.99, to connect to the
namenode, the following message will be printed:

   Wrong FS: hdfs://10.120.16.99:9000/, expected:
hdfs://sunsystem07:9000

You will only be able to connect successfully if
hdfs://sunsystem07:9000 is used.

 


It seems reasonable to allow connection either by IP or name.   Is there
a reason for this behavior or is it a bug?

 







Re: problem with completion notification from block movement

2009-02-04 Thread Raghu Angadi

Karl Kleinpaste wrote:

On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote:

The Datanode's use multiple threads with locking and one of the
assumptions is that the block report (1ce per hour by default) takes
little time. The datanode will pause while the block report is running
and if it happens to take a while weird things start to happen.


Thank you for responding, this is very informative for us.

Having looked through the source code with a co-worker regarding
periodic scan and then checking the logs once again, we find that we are
finding reports of this sort:

BlockReport of 1158499 blocks got processed in 308860 msecs
BlockReport of 1159840 blocks got processed in 237925 msecs
BlockReport of 1161274 blocks got processed in 177853 msecs
BlockReport of 1162408 blocks got processed in 285094 msecs
BlockReport of 1164194 blocks got processed in 184478 msecs
BlockReport of 1165673 blocks got processed in 226401 msecs

The 3rd of these exactly straddles the particular example timeline I
discussed in my original email about this question.  I suspect I'll find
more of the same as I look through other related errors.


You could ask for complete fix in 
https://issues.apache.org/jira/browse/HADOOP-4584 . I don't think 
current patch there fixes your problem.


Raghu.


--karl





Re: Question about HDFS capacity and remaining

2009-01-29 Thread Raghu Angadi

Doug Cutting wrote:
Ext2 by default reserves 5% of the drive for use by root only.  That'd 
be 45MB of your 907GB capacity which would account for most of the 
discrepancy.  You can adjust this with tune2fs.


plus, I think DataNode reports only 98% of the space by default.

Raghu.


Doug

Bryan Duxbury wrote:

There are no non-dfs files on the partitions in question.

df -h indicates that there is 907GB capacity, but only 853GB 
remaining, with 200M used. The only thing I can think of is the 
filesystem overhead.


-Bryan

On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote:


It's taken by non-dfs files.

Hairong


On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:


Hey all,

I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There
are literally zero other files on the partitions hosting the DFS data
directories. Where am I losing 240GB?

-Bryan








Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Raghu Angadi

Owen O'Malley wrote:


On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote:


By scrub I mean, have a tool that reads every block on a given data
node.  That way, I'd be able to find corrupted blocks proactively
rather than having an app read the file and find it.


The datanode already has a thread that checks the blocks periodically 
for exactly that purpose.


since Hadoop 0.16.0. scans all the blocks every 3 weeks (by default, 
interval can be changed).


Raghu.


Re: HDFS - millions of files in one directory?

2009-01-26 Thread Raghu Angadi

Mark Kerzner wrote:

Raghu,

if I write all files only one, is the cost the same in one directory or do I
need to find the optimal directory size and when full start another
bucket?


If you write only once, then writing won't be much of an issue. You can 
write them in lexical order to help with buffer copies. These are all 
implementation details that a user should not depend on.


That said, the rest of the discussion in this thread is going in the 
right direction : to get you to use fewer files that combines a lot of 
these small files.


Large number of small files has overhead in many places in HDFS : strain 
on DataNodes, NameNode memory, etc.


Raghu.



Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi

Nitay wrote:

Why not use the distributed coordination service ZooKeeper? When nodes come
up they write some ephemeral file in a known ZooKeeper directory and anyone
who's interested, i.e. NameNode, can put a watch on the directory and get
notified when new children come up.


NameNode does not do active discovery. It is the DataNodes that contact 
NameNode about their presence. So with ZooKeeper or zeroconf, DataNode 
should be able to discover who their NN is and connect to it.


Raghu.


On Mon, Jan 26, 2009 at 10:59 AM, Allen Wittenauer a...@yahoo-inc.com wrote:




On 1/25/09 8:45 AM, nitesh bhatia niteshbhatia...@gmail.com wrote:

Apple provides opensource discovery service called Bonjour (zeroconf). Is

it

possible to integrate Zeroconf with Hadoop so that discovery of nodes

become

automatic ? Presently for setting up multi-node cluster we need to add

IPs

manually. Integrating it with bonjour can make this process automatic.

How do you deal with multiple grids?

How do you deal with security?








Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi

nitesh bhatia wrote:

Hi
Apple provides opensource discovery service called Bonjour (zeroconf). Is it
possible to integrate Zeroconf with Hadoop so that discovery of nodes become
automatic ? Presently for setting up multi-node cluster we need to add IPs
manually. Integrating it with bonjour can make this process automatic.


It would be nice to have. Note that it is the slaves (DataNodes, 
TaskTrackers) that need to do the discovery. NameNode just passively 
accepts the DataNodes that want to be part of the cluster.


In that sense, NN should announce itself and DNs try to find where the 
NN is. It will nice to have zeroconf feature in some form and we might 
discover many more uses for it. Of course, a cluster should not require it.


Raghu.


Re: Hadoop 0.19 over OS X : dfs error

2009-01-26 Thread Raghu Angadi

nitesh bhatia wrote:

Thanks. It worked. :) in hadoop-env.sh its required to write exact path for
java framework. I changed it to
export
JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
and it started.

In hadoop 0.18.2 export JAVA_HOME=/Library/Java/Home is working fine. I am
confused why we need to give exact path in 0.19 version.


Most likely reason is that your /Library/Java/Home some how ends up 
using JDK 1.5. 0.19 and up require JDK 1.6.x.


Raghu.


Thankyou

--nitesh

On Sun, Jan 25, 2009 at 1:52 PM, Joerg Rieger 
joerg.rie...@mni.fh-giessen.de wrote:


Hello,

what path did you set in conf/hadoop-env.sh?

Before Hadoop 0.19 I had in hadoop-env.sh:
export JAVA_HOME=/Library/Java/Home

But that path, despite using java-preferences to change Java versions,
still uses the Java 1.5 version, e.g.:

$ /Library/Java/Home/bin/java -version
java version 1.5.0_16
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

You have to change the setting to:
export
JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home


Joerg


On 25.01.2009, at 00:16, nitesh bhatia wrote:

 Hi

My current default settings are  for java 1.6

nMac:hadoop-0.19.0 Aryan$ $JAVA_HOME/bin/java -version
java version 1.6.0_07
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)


The system is working fine with Hadoop 0.18.2.

--nitesh

On Sun, Jan 25, 2009 at 4:15 AM, Craig Macdonald cra...@dcs.gla.ac.uk

wrote:

 Hi,

I guess that the java on your PATH is different from the setting of your
$JAVA_HOME env variable.
Try $JAVA_HOME/bin/java -version?

Also, there is a program called Java Preferences on each system for
changing the default java version used.

Craig


nitesh bhatia wrote:

 Hi

I am trying to setup Hadoop 0.19 on OS X. Current Java Version is

java version 1.6.0_07
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

When I am trying to format dfs  using bin/hadoop dfs -format command.
I
am
getting following errors:

nMac:hadoop-0.19.0 Aryan$ bin/hadoop dfs -format
Exception in thread main java.lang.UnsupportedClassVersionError: Bad
version number in .class file
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
 at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:280)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
Exception in thread main java.lang.UnsupportedClassVersionError: Bad
version number in .class file
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
 at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:280)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)


I am not sure why this error is coming. I am having latest Java version.
Can
anyone help me out with this?

Thanks
Nitesh







--
Nitesh Bhatia
Dhirubhai Ambani Institute of Information  Communication Technology
Gandhinagar
Gujarat

Life is never perfect. It just depends where you draw the line.

visit:
http://www.awaaaz.com - connecting through music
http://www.volstreet.com - lets volunteer for better tomorrow
http://www.instibuzz.com - Voice opinions, Transact easily, Have fun


--











Re: HDFS loosing blocks or connection error

2009-01-23 Thread Raghu Angadi


 It seems hdfs isn't so robust or reliable as the website says and/or I
 have a configuration issue.

quite possible. How robust does the website say it is?

I agree debuggings failures like the following is pretty hard for casual 
users. You need look at the logs for block, or run 'bin/hadoop fsck 
/stats.txt' etc. Reason could as simple as no live datanodes or as 
complex as strange network behavior triggering a bug in DFSClient.


You can start by looking at or attaching client log around the lines 
that contain block id. Also, note the version you are running.


Raghu.

Zak, Richard [USA] wrote:
Might there be a reason for why this seems to routinely happen to me 
when using Hadoop 0.19.0 on Amazon EC2?
 
09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:45:55 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:45:58 INFO hdfs.DFSClient: Could not obtain block 
blk_-1757733438820764312_6736 from any node:  java.io.IOException: No 
live nodes contain current block
09/01/23 11:46:01 WARN hdfs.DFSClient: DFS Read: java.io.IOException: 
Could not obtain block: blk_-1757733438820764312_6736 file=/stats.txt
 
 
Richard J. Zak




Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi


If you are adding and deleting files in the directory, you might notice 
CPU penalty (for many loads, higher CPU on NN is not an issue). This is 
mainly because HDFS does a binary search on files in a directory each 
time it inserts a new file.


If the directory is relatively idle, then there is no penalty.

Raghu.

Mark Kerzner wrote:

Hi,

there is a performance penalty in Windows (pardon the expression) if you put
too many files in the same directory. The OS becomes very slow, stops seeing
them, and lies about their status to my Java requests. I do not know if this
is also a problem in Linux, but in HDFS - do I need to balance a directory
tree if I want to store millions of files, or can I put them all in the same
directory?

Thank you,
Mark





Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi

Raghu Angadi wrote:


If you are adding and deleting files in the directory, you might notice 
CPU penalty (for many loads, higher CPU on NN is not an issue). This is 
mainly because HDFS does a binary search on files in a directory each 
time it inserts a new file.


I should add that equal or even bigger cost is the memmove that 
ArrayList does when you add or delete entries.


ArrayList, rather than a map is used mainly to save memory, them most 
precious resource for NameNode.


Raghu.


If the directory is relatively idle, then there is no penalty.

Raghu.

Mark Kerzner wrote:

Hi,

there is a performance penalty in Windows (pardon the expression) if 
you put
too many files in the same directory. The OS becomes very slow, stops 
seeing
them, and lies about their status to my Java requests. I do not know 
if this
is also a problem in Linux, but in HDFS - do I need to balance a 
directory
tree if I want to store millions of files, or can I put them all in 
the same

directory?

Thank you,
Mark







Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Raghu Angadi

Sagar Naik wrote:

Hi Raghu,


The periodic du and block reports thread thrash the disk. (Block 
Reports takes abt on an avg 21 mins )


and I think all the datanode threads are not able to do much and freeze


yes, that is the known problem we talked about in the earlier mails in 
this thread.


When you have millions of blocks, one hour for du and block report 
intervals is too often for you. May be you could increase it to 
something like 6 or 12 hours.


It still does not fix the block report problem since DataNode does the 
scan in-line.


As I mentioned in earlier mails, we should really fix the block report 
problem. As simple fix would scan (very slowly, unlike DU) the 
directories in the background.


Even after fixing block reports, you should be aware that excessive 
number of block does impact the performance. No system can guarantee 
performance when overloaded. What we want to do is to make Hadoop 
degrade gracefully.. rather than DNs getting killed.


Raghu.


Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Raghu Angadi


The scan required for each block report is well known issue and it can 
be fixed. It was discussed multiple times (e.g. 
https://issues.apache.org/jira/browse/HADOOP-3232?focusedCommentId=12587795#action_12587795 
).


Earlier, inline 'du' on datanodes used to cause the same problem and 
they they were moved to a separate thread (HADOOP-3232). block reports 
can do the same...


Though 2M blocks on DN is very large, there is no reason block reports 
should break things. Once we fix block reports, something else might 
break.. but that is different issue.


Raghu.

Jason Venner wrote:
The problem we are having is that datanodes periodically stall for 10-15 
minutes and drop off the active list and then come back.


What is going on is that a long operation set is holding the lock on on 
FSDataset.volumes, and all of the other block service requests stall 
behind this lock.


DataNode: [/data/dfs-video-18/dfs/data] daemon prio=10 tid=0x4d7ad400 
nid=0x7c40 runnable [0x4c698000..0x4c6990d0]

  java.lang.Thread.State: RUNNABLE
   at java.lang.String.lastIndexOf(String.java:1628)
   at java.io.File.getName(File.java:399)
   at 
org.apache.hadoop.dfs.FSDataset$FSDir.getGenerationStampFromFile(FSDataset.java:148) 

   at 
org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:181)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:412)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:511) 


   - locked 0x551e8d48 (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
   at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:1053)
   at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:708)
   at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2890)
   at java.lang.Thread.run(Thread.java:619)

This is basically taking a stat on every hdfs block on the datanode, 
which in our case is ~ 2million, and can take 10+ minutes (we may be 
experiencing problems with our raid controller but have no visibility 
into it) at the OS level the file system seems fine and operations 
eventually finish.


It appears that a couple of different data structures are being locked 
with the single object FSDataset$Volume.


Then this happens:
org.apache.hadoop.dfs.datanode$dataxcei...@1bcee17 daemon prio=10 
tid=0x4da8d000 nid=0x7ae4 waiting for monitor entry 
[0x459fe000..0x459ff0d0]

  java.lang.Thread.State: BLOCKED (on object monitor)
   at 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:473) 

   - waiting to lock 0x551e8d48 (a 
org.apache.hadoop.dfs.FSDataset$FSVolumeSet)

   at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:934)
   - locked 0x54e550e0 (a org.apache.hadoop.dfs.FSDataset)
   at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.init(DataNode.java:2322)
   at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1187)

   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1045)
   at java.lang.Thread.run(Thread.java:619)

which locks the FSDataset while waiting on the volume object

and now all of the Datanode operations stall waiting on the FSDataset 
object.

--

Our particular installation doesn't use multiple directories for hdfs, 
so a first simple hack for a local fix would be to modify getNextVolume 
to just return the single volume and not be synchronized


A richer alternative would be to make the locking more fine grained on 
FSDataset$FSVolumeSet.


Of course we are also trying to fix the file system performance and dfs 
block loading that results in the block report taking a long time.


Any suggestions or warnings?

Thanks.







Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Raghu Angadi


2M files is excessive. But there is no reason block reports should 
break. My preference is to make block reports handle this better. DNs 
dropping in and out of the cluster causes too many other problems.


Raghu.

Konstantin Shvachko wrote:

Hi Jason,

2 million blocks per data-node is not going to work.
There were discussions about it previously, please
check the mail archives.

This means you have a lot of very small files, which
HDFS is not designed to support. A general recommendation
is to group small files into large ones, introducing
some kind of record structure delimiting those small files,
and control it in on the application level.

Thanks,
--Konstantin



Re: xceiverCount limit reason

2009-01-08 Thread Raghu Angadi

Jean-Adrien wrote:

Is it the responsibility of hadoop client too manage its connection pool
with the server ? In which case the problem would be an HBase problem?
Anyway I found my problem, it is not a matter of performances.



Essentially, yes. Client has to close the file to relinquish 
connections, if clients are using the common read/write interface.


Currently if a client keeps many hdfs files open, it results in many 
threads held at the DataNodes. As you noticed, timeout at DNs helps.


Various solutions are possible at different levels: application(hbase), 
Client API, HDFS, etc. https://issues.apache.org/jira/browse/HADOOP-3856 
is proposal at HDFS level.


Raghu.


Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Raghu Angadi


Did you look at FSEditLog.EditLogFileOutputStream.flushAndSync()?

This code was re-organized sometime back. But the guarantees it provides 
should be exactly same as before. Please let us know otherwise.


Raghu.

Jason Venner wrote:
I have always assumed (which is clearly my error) that edit log writes 
were flushed to storage to ensure that the edit log was consistent 
during machine crash recovery.


I have been working through FSEditLog.java and I don't see any calls of 
force(true) on the file channel or sync on the file descriptor, and the 
edit log is not opened with an 's' or 'd' ie: the open flags are rw 
and not rws or rwd.


The only thing I see in the code, is that the space in the file where 
the updates will be written is preallocated.


Have I missed the mechanism that the edit log data is flushed to the disk?

Is the edit log data not forcibly flushed to the disk, instead reling on 
the host operating system to perform the physical writes at a later date?


Thanks -- Jason




Re: cannot allocate memory error

2008-12-31 Thread Raghu Angadi
Your OS is running out of memory. Usually a sign of too many processes 
(or threads) on the machine. Check what else is happening on the system.


Raghu.

sagar arlekar wrote:

Hello,

I am new to hadoop. I am running hapdoop 0.17 in a Eucalyptus cloud
instance (its a centos image on xen)

bin/hadoop dfs -ls /
gives the following Exception

08/12/31 08:58:10 WARN fs.FileSystem: localhost:9000 is a deprecated
filesystem name. Use hdfs://localhost:9000/ instead.
08/12/31 08:58:10 WARN fs.FileSystem: uri=hdfs://localhost:9000
javax.security.auth.login.LoginException: Login failed: Cannot run
program whoami: java.io.IOException: error=12, Cannot allocate
memory
at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
at 
org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1353)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1289)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:108)
at org.apache.hadoop.fs.FsShell.init(FsShell.java:87)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1717)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1866)
Bad connection to FS. command aborted.

Running the command again gives.
bin/hadoop dfs -ls /
Error occurred during initialization of VM
Could not reserve enough space for object heap

Changing value of 'mapred.child.java.opts' property in hadoop-site.xml
did not help.

Kindly help me. What could I do to give more memory to hadoop?

BTW is there a way to search through the mail archive? I only saw the
mails listed according to year and months.

Regards,
Sagar




Re: Performance testing

2008-12-31 Thread Raghu Angadi

Sandeep Dhawan wrote:

Hi,

I am trying to create a hadoop cluster which can handle 2000 write requests
per second.
In each write request I would writing a line of size 1KB in a file.


This is essentially a matter of deciding how many datanodes (with the 
given configuration) do you need to write 3*2000*2 files per second 
(assuming each 1KB is a separate HDFS file).


You can test this on single datanode. For e.g. if your datanode supports 
1000 of 1KB files per second (even with multiple processes creating at 
the same time), then you you need 12 datanodes (+ any factor of safety 
you want to add).


How many nodes or disks do you have approximately?

Raghu.



I would be using machine having following configuration:
Platfom: Red Hat Linux 9.0 
CPU : 2.07 GHz

RAM : 1GB

Can anyone help in giving me some pointers/guideline as to how to go about
setting up such a cluster.
What are the configuration parameters in hadoop with which we can tweak to
ehance the performance of the hadoop cluster. 


Thanks,
Sandeep




Re: Performance testing

2008-12-31 Thread Raghu Angadi


I should add that your test should both create and delete files.

Raghu.

Raghu Angadi wrote:

Sandeep Dhawan wrote:

Hi,

I am trying to create a hadoop cluster which can handle 2000 write 
requests

per second.
In each write request I would writing a line of size 1KB in a file.


This is essentially a matter of deciding how many datanodes (with the 
given configuration) do you need to write 3*2000*2 files per second 
(assuming each 1KB is a separate HDFS file).


You can test this on single datanode. For e.g. if your datanode supports 
1000 of 1KB files per second (even with multiple processes creating at 
the same time), then you you need 12 datanodes (+ any factor of safety 
you want to add).


How many nodes or disks do you have approximately?

Raghu.



I would be using machine having following configuration:
Platfom: Red Hat Linux 9.0 CPU : 2.07 GHz
RAM : 1GB

Can anyone help in giving me some pointers/guideline as to how to go 
about

setting up such a cluster.
What are the configuration parameters in hadoop with which we can 
tweak to

ehance the performance of the hadoop cluster.
Thanks,
Sandeep






Re: Map-Reduce job is using external IP instead of internal

2008-12-31 Thread Raghu Angadi


Your configuration for task tracker, job tracker might be using external 
hostnames. Essentially any hostnames in configuration files should 
resolve to internal ips.


Raghu.

Genady wrote:

Hi,

 


We're using Hadoop 0.18.2/Hbase 0.18.1 four-nodes cluster on CentOS Linux,
in /etc/hosts the following mapping were added to make hadoop to use
internal IPs (fast Ethernet 1GB card):

 


10.1.0.56   master.hostname  master

10.1.0.55   slave1.hostname slave1

10.1.0.53   slave3.hostname slave3

10.1.0.50   slave2.hostname slave2

 


If I do local copy to hadoop dfs it works perfectly, hadoop is using
internal IP only to copy data, but when Map-Reduce job starts it's clear(
from monitoring network cards data) that hadoop is using external IP in
addition to internal with pretty big rates ~5Mb/s in each node, is it
something I should add in hadoop configuration to force hadoop to use only
internal(LAN) IP?

 

 

 


Genady Gilin






Re: DFS replication and Error Recovery on failure

2008-12-29 Thread Raghu Angadi

Konstantin Shvachko wrote:



1) If i set value of dfs.replication to 3 only in hadoop-site.xml of
namenode(master) and
then restart the cluster will this take effect. or  i have to change
hadoop-site.xml at all slaves ?


dfs.replication is the name-node parameter, so you need to restart
only the name-node in order to reset the value.


Actually this is a client parameter. NameNode strictly does not need to 
be restarted. All the files created by the clients using new value will 
have the new replication.


Raghu.



Re: Does datanode acts as readonly in case of DiskFull ?

2008-12-17 Thread Raghu Angadi

Sagar Naik wrote:

Hi ,

I would like to know what happens in case of DiskFull on a datanode
Does the datanode acts as block server only ?


Yes. I think so.

Does it rejects anymore Block creation request OR Namenode does not list 
it for new blocks


yes. NN will not allocate it any more blocks.

Did you notice anything different? (which is quite possible).

Raghu.



Hadoop 18
-Sagar




Re: Hit a roadbump in solving truncated block issue

2008-12-16 Thread Raghu Angadi

Brian Bockelman wrote:

Hey,

I hit a bit of a roadbump in solving the truncated block issue at our 
site: namely, some of the blocks appear perfectly valid to the 
datanode.  The block verifies, but it is still the wrong size (it 
appears that the metadata is too small too).


What's the best way to proceed?  It appears that either (a) the block 
scanner needs to report to the datanode the size of the block it just 
verified, which is possibly a scaling issue or (b) the metadata file 
needs to save the correct block size, which is a pretty major 
modification, as it requires a change of the on-disk format.


This should be detected by the NameNode. i.e. it should detect this 
replica is shorter (either compared to other replicas or the expected 
size). There are various fixes (recent or being worked on) to this area 
of NameNode and it is mostly covered by of those or should be soon.


Raghu.


Ideas?

Brian




Re: File loss at Nebraska

2008-12-09 Thread Raghu Angadi

Brian Bockelman wrote:


On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:


Also it might be useful to strongly word hadoop-default.conf as many
people might not know a downside exists for using 2 rather then 3 as
the replication factor. Before reading this thread I would have
thought 2 to be sufficient.


I think 2 should be sufficient, but running with 2 replicas instead of 3 
exposes some namenode bugs which are harder to trigger.


Whether 2 is sufficient or not, I completely agree with later part. We 
should treat this as what I think it fundamentally is : fixing Namenode.


I guess lately some of these bugs either got more likely or some similar 
bugs crept in.


Sticking with 3 is a very good advise for maximizing reliability.. but 
from a opportunistic developer point of view a big cluster running with 
replication of 2 is great test case :-).. over all I think is a good 
thing for Hadoop.


Raghu.


Re: Hadoop datanode crashed - SIGBUS

2008-12-01 Thread Raghu Angadi


FYI : Datanode does not run any user code and does not link with any 
native/JNI code.


Raghu.

Chris Collins wrote:
Was there anything mentioned as part of the tombstone message about 
problematic frame?  What java are you using?  There are a few reasons 
for SIGBUS errors, one is illegal address alignment, but from java thats 
very unlikelythere were some issues with the native zip library in 
older vm's.  As Brian pointed out, sometimes this points to a hw issue.


C
On Dec 1, 2008, at 1:32 PM, Sagar Naik wrote:




Brian Bockelman wrote:

Hardware/memory problems?

I m not sure.


SIGBUS is relatively rare; it sometimes indicates a hardware error in 
the memory system, depending on your arch.



*uname -a : *
Linux hdimg53 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 
i686 i686 i386 GNU/Linux

*top's top*
Cpu(s):  0.1% us,  1.1% sy,  0.0% ni, 98.0% id,  0.8% wa,  0.0% hi,  
0.0% si

Mem:   8288280k total,  1575680k used,  6712600k free, 5392k buffers
Swap: 16386292k total,   68k used, 16386224k free,   522408k cached

8 core , xeon  2GHz


Brian

On Dec 1, 2008, at 3:00 PM, Sagar Naik wrote:


Couple of the datanodes crashed with the following error
The /tmp is 15% occupied

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0xb4edcb6a, pid=10111, tid=1212181408
#
[Too many errors, abort]

Pl suggest how should I go to debug this particular problem


-Sagar




Thanks to Brian

-Sagar






Re: Namenode BlocksMap on Disk

2008-11-26 Thread Raghu Angadi

Dennis Kubes wrote:


 From time to time a message pops up on the mailing list about OOM 
errors for the namenode because of too many files.  Most recently there 
was a 1.7 million file installation that was failing.  I know the simple 
solution to this is to have a larger java heap for the namenode.  But 
the non-simple way would be to convert the BlocksMap for the NameNode to 
be stored on disk and then queried and updated for operations.  This 
would eliminate memory problems for large file installations but also 
might degrade performance slightly.  Questions:


1) Is there any current work to allow the namenode to store on disk 
versus is memory?  This could be a configurable option.


2) Besides possible slight degradation in performance, is there a reason 
why the BlocksMap shouldn't or couldn't be stored on disk?


As Doug mentioned the main worry is that this will drastically reduce 
performance. Part of the reason is that large chunk of the work on 
NamenNode happens under a single global lock. So if there is seek under 
this lock, it affects every thing else.


One good long term fix for this is to make it easy to split the 
namespace between multiple namenodes.. There was some work done on 
supporting volumes. Also the fact that HDFS now supports symbolic 
links might make this easier for someone adventurous to use that as a 
quick hack to get around this.


If you have a rough prototype implementation I am sure there will be a 
lot of interest in evaluating it. If Java has any disk based or memory 
mapped data structures, that might be the quickest way to try its affects.


Raghu.


NN JVM process takes a lot more memory than assigned

2008-11-21 Thread Raghu Angadi


There is one instance of NN where JVM process takes 40GB memory though 
jvm is started with 24GB. Java heap is still 24GB. Looks like it ends up 
taking a lot of memory outside. There are a lot entries in pmap similar 
to below that account for the difference. Anyone knows what this might be?


From 'pmap -d'

2ab50800   64308 rwx-- 2ab50800 000:0   [ anon ]
2ab50becd0001228 - 2ab50becd000 000:0   [ anon ]
2ab50c00   64328 rwx-- 2ab50c00 000:0   [ anon ]
2ab50fed20001208 - 2ab50fed2000 000:0   [ anon ]
2ab51000   64720 rwx-- 2ab51000 000:0   [ anon ]
2ab513f34000 816 - 2ab513f34000 000:0   [ anon ]
2ab51400   64552 rwx-- 2ab51400 000:0   [ anon ]


Raghu.


Re: Hadoop 18.1 ls stalls

2008-11-20 Thread Raghu Angadi


Give an example stack trace of a thread that is blocked on callQ and a 
handler thread that is not. trace of all the threads would be even better.


If your handlers waiting on callQ, then they are waiting for work.

Raghu.

Sagar Naik wrote:

*Problem:*
The ls is taking noticeable  time to respond.
*system:*
I have about 1.6 million files and namenode is prety much full with 
heap(2400MB).


I have configured dfs.handler.count to 100, and all IPC server threads 
are locked onto a Queue (I think callQueue)


In the namenode logs, I occasionally see Out of memory Exception.

Any suggestions, to improve the response time of system ?
I can increase the dfs.handler.count to 256, but I guess I will see more 
OOM exceptions.






Re: Hadoop 18.1 ls stalls

2008-11-20 Thread Raghu Angadi

Sagar Naik wrote:

Thanks Raghu,
*datapoints:*
- So when I use FSShell client, it gets into retry mode for 
getFilesInfo() call and takes a long time.


What does retry mode mean?


 - Also, when do a ls operation, it takes secs(4/5) .

 - 1.6 million files and namenode is mostly full with heap(2400M)  (from 
ui)


When you say 'ls', how many does it return? (ie. ls of one file, or -lsr 
of thousands of files etc).


None of the IPC threads in your stack trace is doing any work.




Re: ipc problems afer upgrading to hadoop 0.18

2008-11-18 Thread Raghu Angadi

Johannes Zillmann wrote:

OK,

that there is a exception was right. (this happened in a ipc-server not 
reachable-test)
I just missed that fact that now a other exception is thrown in 
difference to previous version.
Previous versions has thrown exceptions like ConnectException or 
SocketTimeoutException. Now java.io.IOException: Call failed on local 
exception is thrown.


It will soon be fixed : https://issues.apache.org/jira/browse/HADOOP-4659

Raghu.


Thats it!
thanks
Johannes


On Nov 18, 2008, at 7:58 PM, Johannes Zillmann wrote:


Hi there,

having a custom client-server based on hadoop's ipc.
Now after upgrading to hadoop 0.18.1 (from 0.17) i get following 
exception:


...
Caused by: java.io.IOException: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
...
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:358)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441)

Any Ideas ?

Johannes


~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com




~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com





Re: hadoop 0.18.2 Checksum ok was sent and should not be sent again

2008-11-17 Thread Raghu Angadi

Rong-en Fan wrote:

I believe it was for debug purpose and was removed after 0.18.2 released.


Yes. It is fixed in 0.18.3 (HADOOP-4499).

Raghu.


On Mon, Nov 17, 2008 at 8:57 PM, Alexander Aristov 
[EMAIL PROTECTED] wrote:


Hi all
I upgraded hadoop to the 0.18.2 version and tried to run a test job,
distcopy from S3 to HDFS


I got a lot of info-level errors although the job successfully finished.

Any ideas? Can I simply suppress INFOs in log4j and forget about the error?




Re: Large number of deletes takes out DataNodes

2008-11-17 Thread Raghu Angadi


This is a long known issue.. deleting files takes a lot of time and 
datanode does not heart beat during that time. Please file a jira so 
that the issue percolates up :)


There are more of these cases that result in datanode being marked dead.

As work around, you can double or triple heartbeat.recheck.interval 
(default 5000 millseconds) in config.


Raghu.

Jonathan Gray wrote:

In many of my jobs I create intermediate data in HDFS.  I keep this around for 
a number of days for inspection until I delete it in large batches.

If I attempt to delete all of it at once, the flood of delete messages to the 
datanodes seems to cause starvation as they do not seem to responding to the 
namenode heartbeat.  There is about 5% cpu utilization and the logs just show 
the deletion of blocks.

In the worst case (if I'm deleting a few terabytes, about 50% of total capacity 
across 10 nodes) this causes the master to expire the datanode leases.  Once 
the datanode finishes deletions, it reports back to master and is added back to 
the cluster.  At this point, its blocks have already started to be reassigned 
so it then starts as an empty node.  In one run, this happened to 8 out of 10 
nodes before getting back to a steady state.  There were a couple moments 
during that run that a number of the blocks had replication 1.

Obviously I can handle this by deleting less at any one time, but it seems like 
there might be something wrong.  With no CPU utilization, why does the datanode 
not respond to the namenode?

Thanks.

Jonathan Gray





Re: Datanode block scans

2008-11-13 Thread Raghu Angadi

Brian Bockelman wrote:

Hey all,

I noticed that the maximum throttle for the datanode block scanner is 
hardcoded at 8MB/s.


I think this is insufficient; on a fully loaded Sun Thumper, a full scan 
at 8MB/s would take something like 70 days.


Is it possible to make this throttle a bit smarter?  At the very least, 
would anyone object to a patch which exposed this throttle as a config 
option?  Alternately, a smarter idea would be to throttle the block 
scanner at (8MB/s) * (# of volumes), under the assumption that there is 
at least 1 disk per volume.


Making the max configurable seems useful. Either of the above options is 
fine, though the first one might be simpler for configuration.


8MB/s is calculated for around 4TB of data on a node. given 80k seconds 
a day, it is around 6-7 days. 8-10 MB/s is not too bad a load on 2-4 
disk machine.


Hm... on second thought, however trivial the resulting disk I/O would 
be, on the Thumper example, the maximum throttle would be 3Gbps: that's 
a nontrivial load on the bus.


How do other big sites handle this?  We're currently at 110TB raw, are 
considering converting ~240TB over from another file system, and are 
planning to grow to 800TB during 2009.  A quick calculation shows that 
to do a weekly scan at that size, we're talking ~10Gbps of sustained reads.


You have a 110 TB on single datanode and moving to 800TB nodes? Note 
that this rate applies to amount of data on a single datanode.


Raghu.

I still worry that the rate is too low; if we have a suspicious node, or 
users report a problematic file, waiting a week for a full scan is too 
long.  I've asked a student to implement a tool which can trigger a full 
block scan of a path (the idea would be able to do hadoop fsck 
/path/to/file -deep).  What would be the best approach for him to take 
to initiate a high-rate full volume or full datanode scan?





Re: Datanode block scans

2008-11-13 Thread Raghu Angadi


How often is safe depends on what probabilities you are willing to accept.

I just checked on one of clusters with 4PB of data, the scanner fixes 
about 1 block a day. Assuming avg size of 64MB per block (pretty high), 
probability that 3 copies of one replica go bad in 3 weeks is of the 
range 1e-12. In reality it is mostly 2-3 orders less probable.


Raghu.

Brian Bockelman wrote:


On Nov 13, 2008, at 11:32 AM, Raghu Angadi wrote:


Brian Bockelman wrote:

Hey all,
I noticed that the maximum throttle for the datanode block scanner is 
hardcoded at 8MB/s.
I think this is insufficient; on a fully loaded Sun Thumper, a full 
scan at 8MB/s would take something like 70 days.
Is it possible to make this throttle a bit smarter?  At the very 
least, would anyone object to a patch which exposed this throttle as 
a config option?  Alternately, a smarter idea would be to throttle 
the block scanner at (8MB/s) * (# of volumes), under the assumption 
that there is at least 1 disk per volume.


Making the max configurable seems useful. Either of the above options 
is fine, though the first one might be simpler for configuration.


8MB/s is calculated for around 4TB of data on a node. given 80k 
seconds a day, it is around 6-7 days. 8-10 MB/s is not too bad a load 
on 2-4 disk machine.


Hm... on second thought, however trivial the resulting disk I/O would 
be, on the Thumper example, the maximum throttle would be 3Gbps: 
that's a nontrivial load on the bus.
How do other big sites handle this?  We're currently at 110TB raw, 
are considering converting ~240TB over from another file system, and 
are planning to grow to 800TB during 2009.  A quick calculation shows 
that to do a weekly scan at that size, we're talking ~10Gbps of 
sustained reads.


You have a 110 TB on single datanode and moving to 800TB nodes? Note 
that this rate applies to amount of data on a single datanode.




Nah -110TB total in the system (200 datanodes), and will move to 800TB 
total (probably 250 datanodes).


However, we do have some larger nodes (we range from 80GB to 48TB per 
node); recent and planned purchases are in the 4-8TB per node range, but 
I'd sure hate to throw away 48TB of disks :)


On the 48TB node, a scan at 8MB/s would take 70 days.  I'd have to run 
at a rate of 80MB/s to scan through in 7 days.  While 80MB/s over 48 
disks is not much, I was curious about how the rest of the system would 
perform (the node is in production on a different file system right now, 
so borrowing it is not easy...); 80MB/s sounds like an awful lot for 
background noise.


Do any other large sites run such large nodes?  How long of a period 
between block scans do sites use in order to feel safe ?


Brian


Raghu.

I still worry that the rate is too low; if we have a suspicious node, 
or users report a problematic file, waiting a week for a full scan is 
too long.  I've asked a student to implement a tool which can trigger 
a full block scan of a path (the idea would be able to do hadoop 
fsck /path/to/file -deep).  What would be the best approach for him 
to take to initiate a high-rate full volume or full datanode scan?








Re: File Descriptors not cleaned up

2008-11-11 Thread Raghu Angadi

Jason Venner wrote:
We have just realized one reason for the '/no live node contains block/' 
error from /DFSClient/ is an indication that the /DFSClient/ was unable 
to open a connection due to insufficient available file descriptors.


FsShell is particularly bad about consuming descriptors and leaving the 
containing objects for the Garbage Collector to reclaim the descriptors.


We will submit a patch in a few days.


please do.

We know more since I last replied on July 31st. Hadoop itself does not 
have any finalizers that depend on GC to close fds. If GC is affecting 
number of fds, you are likely a victim of HADOOP-4346.


thanks,
Raghu.


Re: LeaseExpiredException and too many xceiver

2008-10-31 Thread Raghu Angadi


Config on most Y! clusters sets dfs.datanode.max.xcievers to a large 
value .. something like 1k to 2k. You could try that.


Raghu.

Nathan Marz wrote:
Looks like the exception on the datanode got truncated a little bit. 
Here's the full exception:


2008-10-31 14:20:09,978 ERROR org.apache.hadoop.dfs.DataNode: 
DatanodeRegistration(10.100.11.115:50010,
storageID=DS-2129547091-10.100.11.115-50010-1225485937590, 
infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException:

xceiverCount 257 exceeds the limit of concurrent xcievers 256
at 
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1030)

at java.lang.Thread.run(Thread.java:619)


On Oct 31, 2008, at 2:49 PM, Nathan Marz wrote:


Hello,

We are seeing some really bad errors on our hadoop cluster. After 
reformatting the whole cluster, the first job we run immediately fails 
with Could not find block locations... errrors. In the namenode 
logs, we see a ton of errors like:


2008-10-31 14:20:44,799 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 7276, call addBlock(/tmp/dustintmp/shredded_dataunits/_t$
org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
/tmp/dustintmp/shredded_dataunits/_temporary/_attempt_200810311418_0002_m_23_0$ 

   at 
org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1166)
   at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1097) 


   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)



In the datanode logs, we see a ton of errors like:

2008-10-31 14:20:09,978 ERROR org.apache.hadoop.dfs.DataNode: 
DatanodeRegistration(10.100.11.115:50010, 
storageID=DS-2129547091-10.100.11.1$

of concurrent xcievers 256
   at 
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1030)

   at java.lang.Thread.run(Thread.java:619)



Anyone have any ideas on what may be wrong?

Thanks,
Nathan Marz
Rapleaf






Re: TaskTrackers disengaging from JobTracker

2008-10-30 Thread Raghu Angadi

Devaraj Das wrote:

I wrote a patch to address the NPE in JobTracker.killJob() and compiled
it against TRUNK. I've put this on the cluster and it's now been holding
steady for the last hour or so.. so that plus whatever other differences
there are between 18.1 and TRUNK may have fixed things. (I'll submit the
patch to the JIRA as soon as it finishes cranking against the JUnit tests)



Aaron, I don't think this is a solution to the problem you are seeing. The
IPC handlers are tolerant to exceptions. In particular, they must not die in
the event of an exception during RPC processing. Could you please get a
stack trace of the JobTracker threads (without your patch) when the TTs are
unable to talk to it. Access the url http://jt-host:jt-info-port/stacks
That will tell us what the handlers are up to.


Devaraj fwded the stacks that Aaron sent. As he suspected there is a 
deadlock in RPC server. I will file a blocker for 0.18 and above. This 
deadlock is more likely on a busy network.


Raghu.



Re: Datanode not detecting full disk

2008-10-30 Thread Raghu Angadi

Stefan Will wrote:

Hi Raghu,

Each DN machine has 3 partitions, e.g.:

FilesystemSize  Used Avail Use% Mounted on
/dev/sda1  20G  8.0G   11G  44% /
/dev/sda3 1.4T  756G  508G  60% /data
tmpfs 3.9G 0  3.9G   0% /dev/shm

All of the paths in hadoop-site.xml point to /data, which is the partition
that filled up to 100% (I deleted a bunch of files from HDFS since then). So
I guess the question is whether the DN looks at just the partition its data
directory is on, or all partitions when it determines disk usage.


Datanode checks df on /data alone. What is dfs.df.interval set to?

Also if you set multiple paths for dfs.data.dir, availble for each 
of these adds up, that would be wrong in your case since all of these 
are under one partition.


Raghu.



Re: TaskTrackers disengaging from JobTracker

2008-10-30 Thread Raghu Angadi

Raghu Angadi wrote:
Devaraj fwded the stacks that Aaron sent. As he suspected there is a 
deadlock in RPC server. I will file a blocker for 0.18 and above. This 
deadlock is more likely on a busy network.




Aaron,

Could you try the patch attached to 
https://issues.apache.org/jira/browse/HADOOP-4552 ?


Thanks,
Raghu.


Re: Could not obtain block error

2008-10-29 Thread Raghu Angadi


If have only one copy of the block and it is mostly corrupted.. Namenode 
itself can not correct it. Of course, DFSClient should not print error 
in a infinite loop.


I think there was an old bug where crc file got overwritten by 0 length 
file.


One work around for you is to go to the datanode and remove the .crc 
file for this block (find /datanodedir -name blk_5994030096182059653\*). 
Be careful not to remove the block file itself.


longer term fix : upgrade to more recent version.

Raghu.

murali krishna wrote:

Hi,
When I try to read one of the file from dfs, I get the following error in an 
infinite loop (using 0.15.3)

“08/10/28 23:43:15 INFO fs.DFSClient: Could not obtain block 
blk_5994030096182059653 from any node:  java.io.IOException: No live nodes 
contain current block”

Fsck showed that the file is HEALTHY but under replicated (1 instead of 
configured 2). I checked the datanode log where the only replica exists for 
that block and I can see repeated errors while serving that bock.
2008-10-22 23:55:39,378 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_59940300961820596
53 to 68.142.212.228:50010 got java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.dfs.DataNode$BlockSender.init(DataNode.java:1061)
at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1446)
at java.lang.Thread.run(Thread.java:619)

Any idea what is going on and how can I fix this ?

Thanks,
Murali





  1   2   >