RE: Hadoop 0.15.0 - Reporter issue w/ timing out

2007-11-10 Thread Devaraj Das
There has been a change with respect to the way since progress reporting is done since 0.14. The application has to explicitly send the status (incrCounter doesn't send any status). Even if the application hasn't made any progress, it is okay to call setStatus with the earlier status.

cluster startup problem

2007-11-10 Thread Sebastien Rainville
Hi, I have a cluster made of only 2 PCs. The master acts also as a slave. The cluster seems to start properly. It is functional (I can access the dfs, monitor it with the web interfaces, no errors in the log files...) but it reports that only 1 node is up. For some reason the datanode on the

Splitting output of MapReduce according to file size

2007-11-10 Thread Holger Stenzhorn
Hello, For testing purposes I am running Hapoop in local mode. Is there a possibility to split the output (TextOutputFormat) of a MapReduce job into several output files (e.g. part-, part-0001, etc.) according to some maximal file size per file? I.e. is there a setting such a file size

Re: Splitting output of MapReduce according to file size

2007-11-10 Thread Arun C Murthy
On Sat, Nov 10, 2007 at 07:56:22PM +, Holger Stenzhorn wrote: Hello, For testing purposes I am running Hapoop in local mode. Is there a possibility to split the output (TextOutputFormat) of a MapReduce job into several output files (e.g. part-, part-0001, etc.) according to some maximal

map/reduce with hbase

2007-11-10 Thread Billy
I am using map/reduce with hadoop-0.15.0-streaming.jar to process the data with php scripts. I have coded to process the data the blow is an example of word counts from the input. bin/hadoop jar contrib/hadoop-0.15.0-streaming.jar -mapper /var/www/search/hadoop/wc-mapper.php -reducer

Re: map/reduce with hbase

2007-11-10 Thread Michael Stack
Billy wrote: .. What I am looking to do is get and store the input and output from/in hbase. I haven't tried it but it looks like you can specify input and output classes for streaming with -inputformat and -outputformat options. Try setting these to TableInputFormat [1] and

Re: Hadoop 0.15.0 - Reporter issue w/ timing out

2007-11-10 Thread Doug Cutting
Devaraj Das wrote: There has been a change with respect to the way since progress reporting is done since 0.14. The application has to explicitly send the status (incrCounter doesn't send any status). Even if the application hasn't made any progress, it is okay to call setStatus with the earlier

RE: cluster startup problem

2007-11-10 Thread Sebastien Rainville
This bug is driving me crazy! What tools could I use to find out why slaves are not reported being part of the cluster? I can't find anything wrong in the log files. Using Wireshark, I confirmed that the heartbeat in between the slaves and the master is working. The ssh communication in between

RE: Hadoop 0.15.0 - Reporter issue w/ timing out

2007-11-10 Thread Joydeep Sen Sarma
Did anyone consider the impact of making such a change on existing applications? Curious how it didn't fail any regression test? (the pattern that is reported to be broken is so common). (I suffer from upgradephobia and this doesn't help) -Original Message- From: Doug Cutting

java.io.IOException: Unknown format version:-3

2007-11-10 Thread paradise
Hi I build nutch from svn source: svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch And the nutch-0.9.war is got from the http://apache.mirror.phpchina.com/lucene/nutch/nutch-0.9.tar.gz After I configured the nutch followed by http://wiki.apache.org/nutch/NutchHadoopTutorial when

Re: Hadoop 0.15.0 - Reporter issue w/ timing out

2007-11-10 Thread Derek Gottfrid
I favor considering this a bug. It is easy enough to rework my code but it seems like odd behaviour. On Nov 10, 2007 6:41 PM, Doug Cutting [EMAIL PROTECTED] wrote: Devaraj Das wrote: There has been a change with respect to the way since progress reporting is done since 0.14. The application

Re: map/reduce with hbase

2007-11-10 Thread Billy
Basically what I am trying to do is access hbase from php sense I do not know java and have not found it fun to learn I was looking around and found this https://issues.apache.org/jira/browse/HADOOP-2171 but am unsure if it is what I thank it is looks like a way to access hbase from a socket

RE: Hadoop 0.15.0 - Reporter issue w/ timing out

2007-11-10 Thread Devaraj Das
Actually in the previous approach, progress reporting used to happen from a separate thread in tasks. The issues https://issues.apache.org/jira/browse/HADOOP-1431, https://issues.apache.org/jira/browse/HADOOP-1462 changed this behavior. But, yes, I agree that incrCounter should be indicative of