Re: Ynt: Re: Cannot access Jobtracker and namenode

2009-04-13 Thread Rasit OZDAS
It's normal that they are all empty. Look at files with .log extension. 12 Nisan 2009 Pazar 23:30 tarihinde halilibrahimcakir halilibrahimca...@mynet.com yazdı: I followed these steps: $ bin/stop-all.sh $ rm -ri /tmp/hadoop-root $ bin/hadoop namenode -format $ bin/start-all.sh and looked

Ynt: Re: Ynt: Re: Cannot access Jobtracker and namenode

2009-04-13 Thread halilibrahimcakir
Sorry the log file( hadoop-root-namenode-debian.log ) content: 2009-04-12 16:27:22,762 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG:nbsp;nbsp; host = debian/127.0.1.1

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Todd Lipcon
On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon t...@cloudera.com wrote: Hey Brian, This is really interesting stuff. I'm curious - have you tried these same experiments using the Java API? I'm wondering whether this is FUSE-specific or inherent to all HDFS reads. I'll try to reproduce this

Doubt regarding permissions

2009-04-13 Thread Amar Kamat
Hey, I tried the following : - created a dir temp for user A and permission 733 - created a dir temp/test for user B and permission 722 - - created a file temp/test/test.txt for user B and permission 722 Now in HDFS, user A can list as well as read the contents of

Re: Reduce task attempt retry strategy

2009-04-13 Thread Jothi Padmanabhan
Currently, only failed tasks are attempted on a node other than the one where it failed. For killed tasks, there is no such policy for retries. failed to report status usually indicates that the task did not report sufficient progress. However, it is possible that the task itself was not

Re: HDFS as a logfile ??

2009-04-13 Thread Ariel Rabkin
Chukwa is a Hadoop subproject aiming to do something similar, though particularly for the case of Hadoop logs. You may find it useful. Hadoop unfortunately does not support concurrent appends. As a result, the Chukwa project found itself creating a whole new demon, the chukwa collector,

Re: Modeling WordCount in a different way

2009-04-13 Thread Pankil Doshi
Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Pankil On Wed, Apr 8, 2009 at 12:51 AM, Sharad Agarwal

Re: Multithreaded Reducer

2009-04-13 Thread Owen O'Malley
On Apr 10, 2009, at 11:12 AM, Sagar Naik wrote: Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output sorted. You'd probably want to make a blocking

RE: HDFS as a logfile ??

2009-04-13 Thread Ricky Ho
Ari, thanks for your note. Like to understand more how Chukwa group log entries ... If I have appA running in machine X, Y and appB running in machine Y, Z. Each of them calling the Chukwa log API. Do I have all entries going in the same HDFS file ? or 4 separated HDFS files based on the

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Brian Bockelman
Hey Todd, Been playing more this morning after thinking about it for the night -- I think the culprit is not the network, but actually the cache. Here's the output of your script adjusted to do the same calls as I was doing (you had left out the random I/O part). [br...@red tmp]$ java

Re: Extending ClusterMapReduceTestCase

2009-04-13 Thread czero
Hey all, I'm also extending the ClusterMapReduceTestCase and having a bit of trouble as well. Currently I'm getting : Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2 Starting DataNode 1 with dfs.data.dir:

Re: Extending ClusterMapReduceTestCase

2009-04-13 Thread czero
Sry, I forgot to include the not-IntelliJ-console output :) 09/04/13 12:07:14 ERROR mapred.MiniMRCluster: Job tracker crashed java.lang.NullPointerException at java.io.File.init(File.java:222) at org.apache.hadoop.mapred.JobHistory.init(JobHistory.java:143) at

Re: DataXceiver Errors in 0.19.1

2009-04-13 Thread Raghu Angadi
It need not be anything to worry about. Do you see anything at user level (task, job, copy, or script) fail because of this? On a distributed system with many nodes, there would be some errors on some of the nodes for various reasons (load, hardware, reboot, etc). HDFS usually should work

Map Rendering

2009-04-13 Thread Patterson, Josh
We're looking into power grid visualization and were wondering if anyone could recommend a good java native lib (that plays nice with hadoop) to render some layers of geospatial data. At this point we have the cluster crunching our test data, formats, and data structures, and we're now looking at

Re: Reduce task attempt retry strategy

2009-04-13 Thread Stefan Will
Jothi, thanks for the explanation. One question though: why shouldn't timed out tasks be retried on a different machine ? As you pointed out, it could very well have been due to the machine having problems. To me a timeout is just like any other kind of failure. -- Stefan From: Jothi

raw files become zero bytes when mapreduce job hit outofmemory error

2009-04-13 Thread javateck javateck
I'm running some mapreduce, and some jobs has outofmemory errors, and I find that that the raw data itself also got corrupted, becomes zero bytes, very strange to me, I did not look very detail into it, but just want to check quickly with someone with such experience. I'm running at 0.18.3. thanks

Re: Doubt regarding permissions

2009-04-13 Thread Tsz Wo (Nicholas), Sze
Hi Amar, I just have tried. Everything worked as expected. I guess user A in your experiment was a superuser so that he could read anything. Nicholas Sze /// permission testing // drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55

Re: Map-Reduce Slow Down

2009-04-13 Thread Aaron Kimball
in hadoop-*-examples.jar, use randomwriter to generate the data and sort to sort it. - Aaron On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote: Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more

Grouping Values for Reducer Input

2009-04-13 Thread Streckfus, William [USA]
Hi Everyone, I'm working on a relatively simple MapReduce job with a slight complication with regards to the ordering of my key/values heading into the reducer. The output from the mapper might be something like cat - doc5, 1 cat - doc1, 1 cat - doc5, 3 ... Here, 'cat' is my key and the

Re: Map-Reduce Slow Down

2009-04-13 Thread Jim Twensky
Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be

RE: Grouping Values for Reducer Input

2009-04-13 Thread jeremy.huylebroeck
I'm not familiar with setOutputValueGroupingComparator what about adding the doc# in the key and have your own hashing/Partitioner? so doing something like cat_doc5- 1 cat_doc1- 1 cat_doc5- 3 the hashing method would take everything before _ as the hash. the shuffling would still put the

Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
I'm not sure if this is exactly what you want but, can you emit map records as: cat, doc5 - 3 cat, doc1 - 1 cat, doc5 - 1 and so on.. This way, your reducers will get the intermediate key,value pairs as cat, doc5 - 3 cat, doc5 - 1 cat, doc1 - 1 then you can split the keys (cat, doc*)

Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
Oh, I forgot to tell that you should change your partitioner to send all the keys in the form of cat,* to the same reducer but it seems like Jeremy has been much faster than me :) -Jim On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky jim.twen...@gmail.com wrote: I'm not sure if this is exactly

[ANNOUNCE] hamake-1.0

2009-04-13 Thread Vadim Zaliva
HAMAKE is make-like utility for Hadoop. More information at the project page: http://code.google.com/p/hamake/ Documentation is still quite poor, but core functionality is working and I plan on improving it further. Sincerely, Vadim

Re: Map-Reduce Slow Down

2009-04-13 Thread Mithila Nagendra
Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was accessed at port 54310. The new cluster which I ve setup has Red Hat Linux release 7.2 (Enigma)running on it. Now when I try to access the dfs from one of the slaves i get the following response: dfs cannot be

Re: Map-Reduce Slow Down

2009-04-13 Thread Jim Twensky
Can you ssh between the nodes? -jim On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu wrote: Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was accessed at port 54310. The new cluster which I ve setup has Red Hat Linux release 7.2

Using 3rd party Api in Map class

2009-04-13 Thread Farhan Husain
Hello, I am trying to use Pellet library for some OWL inferencing in my mapper class. But I can't find a way to bundle the library jar files in my job jar file. I am exporting my project as a jar file from Eclipse IDE. Will it work if I create the jar manually and include all the jar files Pellet

Re: Using 3rd party Api in Map class

2009-04-13 Thread Nick Cen
create a directroy call 'lib' in your project's root dir, then put all the 3rd party jar in it. 2009/4/14 Farhan Husain russ...@gmail.com Hello, I am trying to use Pellet library for some OWL inferencing in my mapper class. But I can't find a way to bundle the library jar files in my job jar

java compile time warning while using MultipleOutputs

2009-04-13 Thread Seunghwa Kang
Hello, Java compiler generates the following warning when I use MultipleOutputs class as (MultipleOutpus type) mos.getCollector(String, Reporter) method returns OutputCollector instead of OutputCollector(K,V). warning: [unchecked] unchecked call to collect(K,V) as a member of the raw type

Re: Map-Reduce Slow Down

2009-04-13 Thread Mithila Nagendra
Yes I can.. On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky jim.twen...@gmail.com wrote: Can you ssh between the nodes? -jim On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu wrote: Thanks Aaron. Jim: The three clusters I setup had ubuntu running on them and the dfs was

bzip2 input format

2009-04-13 Thread Edward J. Yoon
Does anyone have a input formatter for bzip2? -- Best Regards, Edward J. Yoon edwardy...@apache.org http://blog.udanax.org

Re: Modeling WordCount in a different way

2009-04-13 Thread sharad agarwal
Pankil Doshi wrote: Hey Did u find any class or way out for storing results of Job1 map/reduce in memory and using that as an input to job2 map/Reduce?I am facing a situation where I need to do similar thing.If anyone can help me out.. Normally you would write the job output to a file and

Re: java compile time warning while using MultipleOutputs

2009-04-13 Thread sharad agarwal
warning: [unchecked] unchecked call to collect(K,V) as a member of the raw type org.apache.hadoop.mapred.OutputCollector Yes, I can live with this warning, but it really makes me uneasy. Any suggestions to remove this warning? You can suppress the warning using annotation in your code:

Re: Reduce task attempt retry strategy

2009-04-13 Thread Jothi Padmanabhan
Usually, a task is killed when 1. User explicitly kills the task himself 2. Framework kills the task because it did not progress enough 3. Tasks that were speculatively executed Hence the reason for killing has, more often than not, nothing to do with the health of the node where it was running,

Is there anyone use hadoop for other applications rather than MapReduce?

2009-04-13 Thread Lei Xu
Hi, all I am wondering whether the hdfs is suitable for other applications , which may be more general purpose application or simply a huge amount of storage? Any feedback for that? Thanks. Lei

Re: Is there anyone use hadoop for other applications rather than MapReduce?

2009-04-13 Thread Hadooper
HI Lei, Is there any particular problem on your hand? Thanks On Mon, Apr 13, 2009 at 9:01 PM, Lei Xu xule...@gmail.com wrote: Hi, all I am wondering whether the hdfs is suitable for other applications , which may be more general purpose application or simply a huge amount of storage? Any

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread jason hadoop
The following very simple program will tell the VM to drop the pages being cached for a file. I tend to spin this in a for loop when making large tar files, or otherwise working with large files, and the system performance really smooths out. Since it use open(path) it will churn through the inode

Re: Hadoop and Image analysis question

2009-04-13 Thread jason hadoop
If you pack your images into sequence files, as the value items, the cluster will automatically do a decent job of ensuring that the input splits made from the sequences files are local to the map task. We did this in production at a previous job and it worked very well for us. Might as well turn

Re: Extending ClusterMapReduceTestCase

2009-04-13 Thread jason hadoop
I have a nice variant of this in the ch7 examples section of my book, including a standalone wrapper around the virtual cluster for allowing multiple test instances to share the virtual cluster - and allow an easier time to poke around with the input and output datasets. It even works decently