Re: Using Map/Reduce without HDFS?

2007-09-01 Thread mfc
on your schedule. Cheers, Doug -- View this message in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12446248 Sent from the Hadoop Users mailing list archive at Nabble.com.

RE: Using Map/Reduce without HDFS?

2007-08-31 Thread mfc
in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12429060 Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Using Map/Reduce without HDFS?

2007-08-31 Thread Ted Dunning
The priority is already listed as major. Nobody is assigned the bug, however. Presumably this won't be the kind of thing an outsider could do easily. On 8/31/07 8:48 AM, mfc [EMAIL PROTECTED] wrote: Not having the ability to append to a file in hadoop seems to be rather limiting. How

RE: Using Map/Reduce without HDFS?

2007-08-31 Thread Ted Dunning
:13 AM To: hadoop-user@lucene.apache.org Subject: Re: Using Map/Reduce without HDFS? mfc wrote: How can this get higher on the priority list? Even just a single appender. Fundamentally, priorities are set by those that do the work. As a volunteer organization, we can't assign tasks. Folks

RE: Using Map/Reduce without HDFS?

2007-08-31 Thread Ted Dunning
Sorry. I should have said newcomers. :-) -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 10:17 AM To: hadoop-user@lucene.apache.org Subject: Re: Using Map/Reduce without HDFS? Ted Dunning wrote: Presumably this won't be the kind of thing

RE: Using Map/Reduce without HDFS?

2007-08-31 Thread Dhruba Borthakur
. Thanks, dhruba -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Friday, August 31, 2007 9:22 AM To: hadoop-user@lucene.apache.org Subject: Re: Using Map/Reduce without HDFS? The priority is already listed as major. Nobody is assigned the bug, however. Presumably

Re: Using Map/Reduce without HDFS?

2007-08-31 Thread Doug Cutting
mfc wrote: How can this get higher on the priority list? Even just a single appender. Fundamentally, priorities are set by those that do the work. As a volunteer organization, we can't assign tasks. Folks must volunteer to do the work. Y! has volunteered more than others on Hadoop, but

Re: Using Map/Reduce without HDFS?

2007-08-31 Thread Doug Cutting
Ted Dunning wrote: Presumably this won't be the kind of thing an outsider could do easily. There are no outsiders here, I hope! We try to conduct everything in the open, from design through implementation and testing. If you feel that you're missing discussions, please ask questions. Some

RE: Using Map/Reduce without HDFS?

2007-08-29 Thread Ted Dunning
: Using Map/Reduce without HDFS? haven't heard much on this subject actually: http://issues.apache.org/jira/browse/HADOOP-1700 On 8/29/07, Ted Dunning [EMAIL PROTECTED] wrote: You can't append in hadoop, AFAIK. The appending would be done outside of Hadoop with a periodic copy into HDFS. I

Re: Using Map/Reduce without HDFS?

2007-08-29 Thread mfc
@lucene.apache.org Subject: Re: Using Map/Reduce without HDFS? Hi, Can you elaborate how this is done in Hadoop? Thanks Ted Dunning-3 wrote: It is often also possible to merge the receiving of the new data with the appending to a large file. The append nature of the writing makes

RE: Using Map/Reduce without HDFS?

2007-08-29 Thread Ted Dunning
Or lots and lots of little files. None of these is very attractive. -Original Message- From: mfc [mailto:[EMAIL PROTECTED] Sent: Wed 8/29/2007 7:00 PM To: hadoop-user@lucene.apache.org Subject: Re: Using Map/Reduce without HDFS? Hi, So two alternatives to no append: 1) The big

Re: Using Map/Reduce without HDFS?

2007-08-27 Thread mfc
. -- View this message in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12360816 Sent from the Hadoop Users mailing list archive at Nabble.com.

Using Map/Reduce without HDFS?

2007-08-26 Thread mfc
) and there are thousands of them, would it make sense to process the files directly from the local file system via Map/Reduce? Is there a mode in Hadoop to do this? Does Hadoop make sense to use in this case? Thanks -- View this message in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread Ted Dunning
If you follow your situation to the logical extreme, you get a situation somewhat similar to what we have. At the far end of this logic, you get huge numbers of files to process. We get more than 100,000 per hour. It quickly becomes apparent that it is impossible to process this many

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread mfc
are you talking about? Something as simple as cat? Thanks -- View this message in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12337733 Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread Ted Dunning
Cat would work if you don't care about total storage. Often the input to map-reduce programs are line or record oriented data that exhibit lots of redundancy and thus could be compressed significantly. Log files are a concrete example. Thus, you might consider cat | gzip. That might not be

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread mfc
on the local file system on the small files. I'd be interested in knowing if this is an appropriate use of Hadoop, I've got limited knowledge about Hadoop, and I'm just trying to learn about where/how it can be used. Thanks -- View this message in context: http://www.nabble.com/Using-Map-Reduce

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread kate rhodes
Hadoop works fine on the local file system. The example apps don't even bother copying things into hdfs first. But the problem, as Ted mentioned, with working with huge numbers of small files on the filesystem is IO speed. Hard drives just aren't that fast no matter how much you spend. I would

Re: Using Map/Reduce without HDFS?

2007-08-26 Thread Ted Dunning
Yes. I am recommending a pre-processing step before the map-reduce program. And yes. They do get split up again. They also get copied to multiple nodes so that the reads can proceed in parallel. The most important effects of concatenation and importing into HDFS are the parallelism and the