Re: Reduce Performance

2007-08-21 Thread Enis Soztutar
See below... Eric Baldeschwieler wrote: Actually... I think it is greatly in the projects interest to have a really elegant one node solution. It should certainly support multithreading, the web UI, etc. AFAIK, local setup has never been the interest of hadoop, however, a good

newbie problem: Using DFS from Java program

2007-08-21 Thread Jani Arvonen
Hello everybody! I've been trying to use hadoop distributed file system from my java spring web application but without any good results :). We have one server where hadoop namenode are datanode are succesfully running (so they all are running on a single node). I managed to configure it with the

Re: newbie problem: Using DFS from Java program

2007-08-21 Thread Sagar Naik
Hello Can u provide some more information. Like a stackstrace from log files - Sagar Jani Arvonen wrote: Hello everybody! I've been trying to use hadoop distributed file system from my java spring web application but without any good results :). We have one server where hadoop namenode are

Hadoop release 0.14.0 available

2007-08-21 Thread Doug Cutting
New features in release 0.14.0 include: - Better checksums in HDFS. Checksums are no longer stored in parallel HDFS files, but are stored directly by datanodes alongside blocks. This is more efficient for the namenode and also improves data integrity. - Pipes: A C++ API for MapReduce -

Re: Reduce Performance

2007-08-21 Thread Owen O'Malley
On Aug 21, 2007, at 12:30 AM, Enis Soztutar wrote: I think it is greatly in the projects interest to have a really elegant one node solution. It should certainly support multithreading, the web UI, etc. AFAIK, local setup has never been the interest of hadoop, however, a good

RE: how to deal with large amount of key value pair outputs in one run of map task

2007-08-21 Thread Eric Zhang
Thanks, Owen. By configing mapred.child.java.opts to larger value (took a little while to figure out the right way to config it: -Xmx300m), the outofmemory problem went away. It's good know that the default value of io.sort.mb is set to 100M and my map task required about 300M heap size to run.

missing combiner output

2007-08-21 Thread Joydeep Sen Sarma
Hi folks, I am a little puzzled by (what looks to me) is like records that I am emitting from my combiner - but that are not showing up under 'combine output records' (and seem to be disappearing). Here's some evidence: Mapred says: Combine input records 230,803,567 Combine output

RE: missing combiner output

2007-08-21 Thread Joydeep Sen Sarma
Ah - never mind - the 'combiner output record' metric reported by mapred is lying. The reduce job does see all the records. (I guess this is a bug) -Original Message- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 21, 2007 5:30 PM To: