I created HADOOP-2497 to describe this bug.
Was your sequence file stored on HDFS? Because HDFS does provide
checksums.
On Dec 28, 2007, at 7:20 AM, Jason Venner wrote:
Our OOM was being caused by a damaged sequence data file. We had
assumed that the sequence files had checksums, which
Hi Folks,
Please ignore the last email I sent with this subject. I was just
planning to pass some information to the guy in the next cube.
Instead I shared it with the world.
Whoops,
E14
yes please!
On Sep 24, 2007, at 11:36 AM, Owen O'Malley wrote:
On Sep 24, 2007, at 12:00 AM, Enis Soztutar wrote:
Arun, could you please port the discussion to wiki. That would be
very helpful. Thanks.
Actually, I think it is better to put this kind of documentation
either in the the
Sounds interesting. Let us know of any success you have!
Cross posting to -dev so more folks will notice.
On Sep 5, 2007, at 2:47 PM, David Savage wrote:
Thx very much, sorry for the spam in previous mail in that case.
Yep agreed, most of the changes were minor changes really - I'll do as
Hadoop has a lot of inefficiencies in it still.
Most of them are not related to the language choice.
If you look at what the per node tasks are doing (as opposed to the
name node and job tracker) you will see that very little real work is
being done by Hadoop Java code.
Pumbing bytes / io
We are very interested in ideas and patches to improve the systems
stability.
This is very young software, but we are using it at very large scale
and intend to keep enhancing it. We currently have a 2000 node file
system with 3TB raw storage per node and are supporting millions of
Responses to the list welcome. I know of several companies not on
that list that are using it.
It would be great to hear from you guys.
E14
On Sep 4, 2007, at 6:59 AM, C G wrote:
All:
I am interested in hearing any success stories around deploying
Hadoop in a commercial/non-academic
Keeping all the datastructures simple and in ram let's us keep the
transaction rate pretty high.
Going to a DB while keeping the transaction rate up would require a
lot of engineering. And would add complexity to administering the
system.
I'm not a fan of this approach, at least not
especially at scale!
And we are testing on 1000 node clusters with long jobs. We see
lots of failures per job.
On Aug 24, 2007, at 4:20 PM, Ted Dunning wrote:
On 8/24/07 12:11 PM, Doug Cutting [EMAIL PROTECTED] wrote:
Using the same logic, streaming reduce outputs to
the next map
+1
On Aug 22, 2007, at 11:23 AM, Doug Cutting wrote:
Thorsten Schuett wrote:
In my case, it looks as if the loopback device is the bottleneck. So
increasing the number of tasks won't help.
Hmm. I have trouble believing that the loopback device is actually
the
bottleneck. What makes you
Actually...
I think it is greatly in the projects interest to have a really
elegant one node solution. It should certainly support
multithreading, the web UI, etc.
If it is trivial to write and use single node jobs, then we can write
an application once in map-reduce and use it either
I'll have our operations folks comment on our current techniques.
We use map-reduce jobs to copy from all nodes in the cluster from the
source. Generally using either HTTP(S) or HDFS protocol.
We've seen write rates as high as 8.3 GBytes/sec on 900 nodes. This
is network limited. We
Hi Folks,
I'd love to hear more about how Hadoop is being used in the wild. If
you are using Hadoop, please add your project to our PoweredBy page,
and/or respond to this email.
http://wiki.apache.org/lucene-hadoop/PoweredBy
Thanks!
E14
There may need to be some streaming specific follow on work.
But I believe 489 will capture stderr.
Anyone?
On Oct 20, 2006, at 9:33 PM, Andrew McNabb wrote:
On Fri, Oct 20, 2006 at 03:45:17PM -0700, Eric Baldeschwieler wrote:
I filed:
http://issues.apache.org/jira/browse/HADOOP-619
Hi Andrew,
I filed:
http://issues.apache.org/jira/browse/HADOOP-619
to address the -input issues. There is work in progress to address
getting job debugging info. I think this will be coming out in the
next release (8?).
http://issues.apache.org/jira/browse/HADOOP-489
I'll let others
The limit is RAM in the namenode. Every file uses some of this non-
scalable resource currently. So what is critical is that your total
number of files remains small where small is probably safely
defined as 100s of thousands today.
Bigger files let you use much more storage with the same
Interesting thread.
This relates to HADOOP-288.
Also the thread I started last week on using URLs in general for
input arguments. Seems like we should just take a URL for the jar,
which could be file: or hdfs:
Thoughts?
On Aug 31, 2006, at 10:54 AM, Doug Cutting wrote:
Frédéric Bertin
I think it is interesting. I think you'd want a way to specify that
the target file is itself a list of additional URIs as well. That
would support scenarios such as a .jsp on a master server that simply
listed its slaves and then the slaves could list their local content.
Might also
18 matches
Mail list logo