Re: HDFS files naming convention

2009-04-25 Thread Pankil Doshi
hey you can surely do that using MulipleFileOutputFormat. We have already implemented that Pankil On Fri, Apr 24, 2009 at 8:58 PM, Aaron Kimball aa...@cloudera.com wrote: Alternatively, just use FileSystem.rename() on the normal output files after reducing is complete? On Sat, Apr 25,

Re: Processing High CPU Memory intensive tasks on Hadoop - Architecture question

2009-04-25 Thread Aaron Kimball
Amit, This can be made to work with Hadoop. Basically, in your mapper's configure stage it would do the heavy load-in process, then it would process your individual work items as records during the actual map stage. A map task can be comprised of many records, so you'll be fine here. If you use

Re: Processing High CPU Memory intensive tasks on Hadoop - Architecture question

2009-04-25 Thread amit handa
Thanks Aaron, The processing libs that we use, which take time to load are all c++ based .so libs. Can i invoke it from JVM during the configure stage of the mapper and keep it running as you suggested ? Can you point me to some documentation regarding the same ? Regards, Amit On Sat, Apr 25,

RuntimeException, coult not obtain block causes trackers to get blacklisted

2009-04-25 Thread Saptarshi Guha
Hello, I have an intensive job running across 5 machines. During the map stage, each map emits 200 records, so effectively for 50,000,000 input reords, the map creates 200*50e6 records. However, after a long time, I see two trackers are blacklisted Caused by: java.lang.RuntimeException: Could not

File name to which mapper's key,value belongs to - is it available?

2009-04-25 Thread Saptarshi Guha
Hello, Is there a conf variable for getting the filename to which the current mapper's key,value belongs to? I have dir/dirA/part-X and dir/dirB/part-X i will process dir, but need to know whether the key,value is from dirA/part-* file or from a dirB/part-* file. I'd much rather not implement my

Re: Advice on restarting HDFS in a cron

2009-04-25 Thread Rakhi Khatwani
Hi, I have faced somewhat a similar issue... i have a couple of map reduce jobs running on EC2... after a week or so, i get a no space on device exception while performing any linux command... so end up shuttin down hadoop and hbase, clear the logs and then restart them. is there a cleaner

Re: File name to which mapper's key,value belongs to - is it available?

2009-04-25 Thread Farhan Husain
For this purpose, I have written my own InputFormat class but I believe there is a better way of doing that. JobCong may provide information of input file. On Sat, Apr 25, 2009 at 12:19 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, Is there a conf variable for getting the filename

Hadoop remains unaware of code changes in eclipse

2009-04-25 Thread Sid123
Hi I am working on the hadoop plugin on eclipse in Linux... everything was working fine when one day hadoop started to ignore any code changes I did in my project. Instead it just ran an old copy of the code from somewhere. Looking at the mapred.local folder where the temporary source files are

Re: Processing High CPU Memory intensive tasks on Hadoop - Architecture question

2009-04-25 Thread jason hadoop
static, pinned items persist across jvm reuse. On Sat, Apr 25, 2009 at 6:44 AM, amit handa amha...@gmail.com wrote: Thanks Aaron, The processing libs that we use, which take time to load are all c++ based .so libs. Can i invoke it from JVM during the configure stage of the mapper and keep

Can't start fully-distributed operation of Hadoop in Sun Grid Engine

2009-04-25 Thread Jasmine (Xuanjing) Huang
Hi, there, My hadoop system (version: 0.18.3) works well under standalone and pseudo-distributed operation. But if I try to run hadoop in fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in fact, the jobTracker and TaskzTracker can be started, but the namenode and secondary

Re: File name to which mapper's key,value belongs to - is it available?

2009-04-25 Thread Chuck Lam
yes, with the JobConf object try job.get(map.input.file); On Sat, Apr 25, 2009 at 12:06 PM, Farhan Husain russ...@gmail.com wrote: For this purpose, I have written my own InputFormat class but I believe there is a better way of doing that. JobCong may provide information of input file.

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

2009-04-25 Thread jason hadoop
the parameter you specify for fs.default name should be of the form hdfs://host:port and the parameter you specify for the mapred.job.tracker MUST be host:port. I haven't looked at 18.3, but it appears that the :port is mandatory. In your case, the piece of code parsing the fs.default.name

Re: Advice on restarting HDFS in a cron

2009-04-25 Thread Aaron Kimball
If your logs were being written to the root partition (/dev/sda1), that's going to fill up fast. This partition is always = 10 GB on EC2 and much of that space is consumed by the OS install. You should redirect your logs to some place under /mnt (/dev/sdb1); that's 160 GB. - Aaron On Sun, Apr

Re: Advice on restarting HDFS in a cron

2009-04-25 Thread Rakhi Khatwani
Thanks Aaron. On Sun, Apr 26, 2009 at 10:37 AM, Aaron Kimball aa...@cloudera.com wrote: If your logs were being written to the root partition (/dev/sda1), that's going to fill up fast. This partition is always = 10 GB on EC2 and much of that space is consumed by the OS install. You should

Re: Processing High CPU Memory intensive tasks on Hadoop - Architecture question

2009-04-25 Thread Aaron Kimball
I'm not aware of any documentation about this particular use case for Hadoop. I think your best bet is to look into the JNI documentation about loading native libraries, and go from there. - Aaron On Sat, Apr 25, 2009 at 10:44 PM, amit handa amha...@gmail.com wrote: Thanks Aaron, The