Apache's board this morning voted to make Hadoop a top-level project
(TLP). The initial project management committee (PMC) for Hadoop will
be composed of the following Hadoop committers:
* Andrzej Bialecki [EMAIL PROTECTED]
* Doug Cutting [EMAIL PROTECTED
Aaron Kimball wrote:
Multiple students should be able to submit jobs and if one student's
poorly-written task is grinding up a lot of cycles on a shared cluster,
other students still need to be able to test their code in the meantime;
I think a simple approach to address this is to limit the
Joydeep Sen Sarma wrote:
- what if current reduce tasks were broken into separate copy, sort and reduce
tasks?
we would get much smaller units of recovery and scheduling.
thoughts?
If copy, sort and reduce are not scheduled together then it would be
very hard to ensure they run on the same
Joydeep Sen Sarma wrote:
if the cluster is unused - why restrict parallelism? if someone's willing to
wake up at 4am to beat the crowd - they would just absolutely hate this.
[It would be better to make your comments in Jira. ]
But if someone starts a long-running job at night that uses the
Runping Qi wrote:
An improvement over Doug's proposal is to make the limit soft in the
following sense:
1. A job is entitled to run up to the limit number of tasks.
2. If there are free slots and no other job waits for their entitled
slots, a job can run more tasks than the limit.
3. When a job
Joydeep Sen Sarma wrote:
can we suspend jobs (just unix suspend) instead of killing them?
We could, but they'd still consume RAM and disk. The RAM might
eventually get paged out, but relying on that is probably a bad idea.
So, this could work for tasks that don't use much memory and whose
Jason Venner wrote:
On investigating, we discovered that the entirety of the next(key,value)
and the entirety of the write( key, value) are synchronized on the file
object.
This causes all threads to back up on the serialization/deserialization.
I'm not sure what you want to happen here.
Ted Dunning wrote:
It seems reasonable that (de)-serialization could be done in threaded
fashion and then just block on the (read) write itself.
That would require a buffer per thread, e.g., replacing Writer#buffer
with a ThreadLocal of DataOutputBuffers. The deflater-related objects
would
Rui Shi wrote:
It is hard to believe that you need to enlarge heap size given the input size
is only 10MB. In particular, you don't load all input at the same time. As for
the program logic, no much fancy stuff, mostly cut and sorting. So GC should be
able to handle...
Out-of-memory
Nate Carlson wrote:
I'm testing out a Hadoop cluster on EC2.. we've currently got 20 nodes,
and for some silly reason, I started the dfs daemon on all of the nodes.
I'd like to drop back down to 3 nodes after we've finished testing the
apps; is there any way to pull the other nodes from dfs
Owen O'Malley wrote:
Is anyone at ApacheCon this week?
I'll be there tomorrow and Friday and will attend the BOF.
See you soon,
Doug
John Wang wrote:
What is the exact time and location for the thursday night roundtable
http://wiki.apache.org/apachecon/BirdsOfaFeatherUs07
Doug
Devaraj Das wrote:
There has been a change with respect to the way since progress reporting is
done since 0.14. The application has to explicitly send the status
(incrCounter doesn't send any status). Even if the application hasn't made
any progress, it is okay to call setStatus with the earlier
Stu Hood wrote:
The slide comparing the time taken to spill to disk between vertices vs
operating purely in memory (around minute 26) is definitely something to think
about.
I have not had a chance to watch the video yet, but, in MapReduce, if
the intermediate dataset is larger than the RAM
Vuk Ercegovac wrote:
If there is a (reasonably) simple solution that addresses
failures (correctness and cost), would there be interest?
Sure, if it provides some significant benefits too. A good benchmark
might be swapping randomly-generated keys and values at each stage, so
it becomes a
Chris Dyer wrote:
For one computation I've been working on lately, over 25% of the time is
spent in the last 10% of each map/reduce operation (this has to do with the
natural distribution of my input data and would be unavoidable even given an
optimal partitioning). During this time, I have
André Martin wrote:
I was thinking of a similar solution/optimization but I have the
following problem:
We have a large distributed system that consists of several
spider/crawler nodes - pretty much like a web crawler system - every
node writes its gathered data directly to the DFS. So there
Joydeep Sen Sarma wrote:
One of the controversies is whether in the presence of failures, this
makes performance worse rather than better (kind of like udp vs. tcp -
what's better depends on error rate). The probability of a failure per
job will increase non-linearly as the number of nodes
Hadoop release 0.15.0 is now available. This release contains my
improvements, new features, bug fixes and optimizations. For more
release details and downloads, visit:
http://lucene.apache.org/hadoop/releases.html
Notably, this release contains the first working version of HBase:
The problem is that these template files are in subversion but are not
included in the released sources. Most folks who build are using
sources checked out from subversion and hence do not have this issue.
The sources included with releases should be buildable, but I don't
think we should
Holger Stenzhorn wrote:
I am using Hadoop under Cygwin with the default settings.
So hence the hadoop.tmp.dir is set to /tmp/hadoop-${user.name} via
the hadoop-default.xml.
Now when I start using Hadoop it creates a directory
c:\tmp\hadoop-holste (as holste is my user name obviously).
But
Holger Stenzhorn wrote:
This fix is exactly the same as done for hadoop-daemon.sh (and introduced
into the Subversion repository already).
Which begs the question: could HBase use hadoop-daemon.sh directly? If
not, could hadoop-daemon.sh be modified to support HBase? Maintaining
two
Johnson, Jorgen wrote:
Create a QueueInputFormat, which provides a RecordReader implementation that
pops values off a globally accessible queue*. This would require filling the
queue with values prior to loading the map/red job. This would allow the
mappers to cram values back into the
Lance Amundsen wrote:
I am starting to wonder if it might be indeed impossible to get map jobs
running w/o writing to the file system as in, not w/o some major
changes to the job and task tracker code.
I was thinking about creating an InputFormat that does no file I/O, instead
is queue
Lance Amundsen wrote:
There's lots of references on decreasing DFS block size to increase maps to
record ratios. What is the easiest way to do this? Is it possible with
the standard SequenceFile class?
You could specify the block size in the Configuration parameter to
Lance Amundsen wrote:
Example: let's say I have 10K one second jobs and I want the whole thing to
run 2 seconds. I currently see no way for Hadoop to achieve this,
That's right. That has not been a design goal to date. Tasks are
typically expected to last at least several seconds. To fix
Lance Amundsen wrote:
Thx, I'll give that a try. Seems to me a method to tell hadoop to split a
file every n key/value pairs would be logical. Or maybe a
createSplitBoundary when appending key/value records?
Splits should not require examining the data: that's not scalable. So
they're
Michael Bieniosek wrote:
Does anybody know if there is a jdk6 available for Mac? I checked the apple
developer site, and there doesn't seem to be one available, despite blogs from
last year claiming apple was distributing it.
Since I do my development work on a Mac, switching to jdk6 would
Colin Evans wrote:
I'm a bit confused by this discussion though. How would compiling the
jars with Java 1.5 and running on 1.6 degrade performance (assuming that
the jars don't use any new 1.6 APIs)?
It won't. The claim is just that running with Java 1.5 degrades
performance significantly.
Jonathan Hendler wrote:
Since Vertica is also a distributed database, I think it may be
interesting to the newbies like myself on the list. To keep the
conversation topical - while it's true there's a major campaign of PR
around Vertica, I'd be interested in hearing more about how HBase
Nick Lothian wrote:
That turns out to be a Unix vs DOS line endings thing (!).
Running the following commands fixed that:
dos2unix.exe /cygdrive/c/dev/prog/hadoop-0.14.1/conf/masters
dos2unix.exe /cygdrive/c/dev/prog/hadoop-0.14.1/conf/slaves
That should not be required. When you install
Stu Hood wrote:
Is it necessary to run the -upgrade operation to take a cluster from 0.14.1 to
0.14.2? None of the release pages say...
No. Bugfix releases should be compatible.
Doug
Erich Nachbar wrote:
Could we use the Hadoop Wiki for this or do we need to setup a separate
Wiki (which I would not prefer)?
Hadoop wiki is fine for this.
Doug
Toby DiPasquale wrote:
In short, yes. Hadoop's code takes advantage of multiple native
threads and you can tune the level of concurrency in the system by
setting mapred.map.tasks and mapred.reduce.tasks to take advantage of
multiple cores on the nodes which have them.
More importantly, you
C G wrote:
Are there any other east coast developers interested in a Boston-area get
together?
FYI, I'll be at ApacheCon in Atlanta this November 14th 15th, which
might be a good place for a Hadoop BOF.
http://www.us.apachecon.com/
Doug
kate rhodes wrote:
It retries as fast as it can.
Yes, I can see that. It seems we should either insert a call to
'sleep(1000)' at JobTracker.java line 696, or remove that while loop
altogether, since JobTracker#startTracker() will already retry on a
one-second interval. In the latter
Ross Boucher wrote:
My cluster has 4 machines on it, so based on the recommendations on the
wiki, I set my reduce count to 8. Unfortunately, the performance was
less than ideal. Specifically, when the map functions had finished, I
had to wait an additional 40% of the total job time just for
Toby DiPasquale wrote:
Why does Hadoop use the Client JVM? I've been told that you should
almost never use the Client JVM and instead use the Server JVM for
anything even remotely long-running. Is the Server JVM less stable?
It doesn't specify the client JVM, rather it just doesn't specify the
Ted Dunning wrote:
Is there any way to add our support to your proposal? Would that even help?
Yes, plese. Join the incubator-general mailing list and participate in
the discussion. Your opinions is welcome there. Only votes from folks
on the Incubator's PMC are binding, but votes from
Jeff Hammerbacher wrote:
has anyone leveraged the ability of datanodes to specify which datacenter
and rack they live in? if so, any evidence of performance improvements? it
seems that rack-awareness is only leveraged in block replication, not in
task execution.
It often doesn't make a big
Release 0.14.1 fixes bugs in 0.14.0.
For release details and downloads, visit:
http://lucene.apache.org/hadoop/releases.html
Thanks to all who contributed to this release!
Doug
Ted Dunning wrote:
I have to say, btw, that the source tree structure of this project is pretty
ornate and not very parallel. I needed to add 10 source roots in IntelliJ to
get a clean compile. In this process, I noticed some circular dependencies.
Would the committers be open to some small
mfc wrote:
How can this get higher on the priority list? Even just a single appender.
Fundamentally, priorities are set by those that do the work. As a
volunteer organization, we can't assign tasks. Folks must volunteer to
do the work. Y! has volunteered more than others on Hadoop, but
Ted Dunning wrote:
Presumably this won't be the kind of thing an outsider could do easily.
There are no outsiders here, I hope! We try to conduct everything in
the open, from design through implementation and testing. If you feel
that you're missing discussions, please ask questions. Some
Arun C Murthy wrote:
One way to reap benefits of both compression and better parallelism is to use
compressed SequenceFiles: http://wiki.apache.org/lucene-hadoop/SequenceFile
Of course this means you will have to do a conversion from .gzip to .seq file
and load it onto hdfs for your job,
I think this is related to HADOOP-1558:
https://issues.apache.org/jira/browse/HADOOP-1558
Per-job cleanups that are not run clientside must be run in a separate
JVM, since we, as a rule, don't run user code in long-lived daemons.
Doug
Stu Hood wrote:
Does anyone have any ideas on this
Matt Kent wrote:
I would find it useful to have some sort of listener mechanism, where
you could register an object to be notified of a job completion event
and then respond to it accordingly.
There is a job completion notification feature.
property
namejob.end.notification.url/name
Thomas Friol wrote:
Other question, : Why the 'hadoop.tmp.dir' is user.name dependant ?
We need a directory that a user can write and also not to interfere with
other users. If we didn't include the username, then different users
would share the same tmp directory. This can cause
Ted Dunning wrote:
It isn't hard to implement these programs as multiple fully fledged
map-reduces, but it appears to me that many of them would be better
expressed as something more like a map-reduce-reduce program.
[ ... ]
Expressed conventionally, this would have write all of the user
Michael Stack wrote:
You might try backing out the HADOOP-1708 patch. It changed the test
guarding the log message you report below.
HADOOP-1708 isn't in 0.14.0.
Doug
Thorsten Schuett wrote:
During the copy phase of reduce, the cpu load was very low and vmstat showed
constant reads from the disk at ~15MB/s and bursty writes. At the same time,
data was sent over the loopback device at ~15MB/s. I don't see what else
could limit the performance here. The disk
New features in release 0.14.0 include:
- Better checksums in HDFS. Checksums are no longer stored in parallel
HDFS files, but are stored directly by datanodes alongside blocks. This
is more efficient for the namenode and also improves data integrity.
- Pipes: A C++ API for MapReduce
-
Yes, that sounds correct. However it will probably change in 0.15,
since so many folks have found it confusing. Exactly how it will change
is still a matter of open debate.
https://issues.apache.org/jira/browse/HADOOP-785
Doug
Michael Bieniosek wrote:
The wiki page
Sebastien Rainville wrote:
I am new to Hadoop. Looking at the documentation, I figured out how to
write map and reduce functions but now I'm stuck... How do we work with
the output file produced by the reducer? For example, the word count
example produces a file with words as keys and the number
Daeseong Kim wrote:
To solve the checksum errors on the non-ecc memory machines, I
modified some codes in DFSClient.java and DataNode.java.
The idea is very simple.
The original CHUNK structure is
{chunk size}{chunk data}{chunk size}{chunk data}...
The modified CHUNK structure is
{chunk
Eyal Oren wrote:
As far as I understand (that's what we do anyway), you have to submit
one jar that contains all your dependencies (except for dependencies on
hadoop libs), including external jars. The easiest is probably to build
maven/ant to build such big jar externally with all its
[EMAIL PROTECTED] wrote:
I've written a map task that will on occasion not compute the correct
result. This can easily be detected, at which point I'd like the map
task to report the error and terminate the entire map/reduce job. Does
anyone know of a way I can do this?
You can easily kill
Andrzej Bialecki wrote:
So far I learned that the secondary namenode keeps refreshing
periodically its backup copies of fsimage and editlog files, and if the
primary namenode disappears, it's the responsibility of the cluster
admin to notice this, shut down the cluster, switch the configs
Since Hadoop 0.12, if you configure fs.trash.interval to a non-zero
value then 'bin/hadoop dfs -rm' will move things to a trash directory
instead of immediately removing them. The Trash is periodically emptied
of older items. Perhaps we should change the default value for this to
60 (one
Phantom wrote:
Here is the scenario I was concerned about. Consider three nodes in the
system A, B and C which are placed say in different racks. Let us say that
the disk on A fries up today. Now the blocks that were stored on A are not
going to re-replicated (this is my understanding but I
Phantom wrote:
I am sure re-replication is not done on every heartbeat miss since that
would be very expensive and inefficient. At the same time you cannot really
tell if a node is partitioned away, crashed or just slow. Is it threshold
based i.e I missed N heartbeats so re-replicate ?
Yes,
In the slaves file, 'localhost' should only be used alone, not with
other hosts, since 'localhost' is not a name that other hosts can use to
refer to a host. It's equivalent to 127.0.0.1, the loopback address.
So, if you're specifying more than one host, it's best to use real
hostnames or IP
You could define an InputFormat whose InputSplits are not files, but
rather simply have a field that is a complex number. The complex field
would be written and read by Writable#write() and Writable#readFields.
This InputFormat would ignore the input directory, since it is not a
file-based
KrzyCube wrote:
I found that File[] editFiles in FSEditLog.java , then i trace the
call stack and found that it can be configured as multi-case of
dfs.name.dir . Is this means the NameNode data can be split into pieces or
just set replication as the number of the strings of dirs that
James Kennedy wrote:
So far what I've had trouble finding examples of MapReduce jobs that are
kicked-off by some one time process that in turn kick off other
MapReduce jobs long after the initial driver process is dead. This
would be more distributed and fault tolerant since it removes
Jun Rao wrote:
I am wondering if anyone has experienced this problem. Sometimes when I
ran a job, a few map tasks (often just one) hang in the initializing phase
for more than 3 minutes (it normally finishes in a couple seconds). They
will eventually finish, but the whole job is slowed down
Raghu Angadi wrote:
Doug Cutting wrote:
Owen wrote:
One side note is that all of the servers have a servlet such that if
you do http://node:port/stacks you'll get a stack trace of all
the threads in the server. I find that useful for remote debugging.
*smile* Although if it is a task jvm
Mathijs Homminga wrote:
Is there a way to easily determine the efficiency of my cluster?
Example:
- there are 5 slaves which can handle 1 task at the time each
- there is one job, split into 5 sub tasks (5 maps and 5 reduces)
- 4 slaves finish their tasks in 1 minute
- 1 slave finishes its tasks
Every 128th key is held in memory. So if you've got 1M keys in a
MapFile, then opening a MapFile.Reader would read 10k keys into memory.
Binary search is used on these in-memory keys, so that a maximum of
127 entries must be scanned per random access.
Doug
Phantom wrote:
Hi All
I know
Phantom wrote:
Which would mean that if I want to have my logs to reside in HDFS I will
have to move them using copyFromLocal or some version thereof and then run
Map/Reduce process against them ? Am I right ?
Yes. HDFS is probably not currently suitable for directly storing log
output as it
Neeraj Mahajan wrote:
I read from Hadoop docs that the task scheduler tries to execute the task
closer to the data. Can this functionality be applied without using HDFS?
How?
You can subclass LocalFileSystem and override getFileCacheHints() to
return the host where the file is known to be
Calvin Yu wrote:
The problem seems to be with the MapTask's (MapTask.java) sort
progress thread (line #196) not stopping after the sort is completed,
and hence the call to join() (line# 190) never returns. This is
because that thread is only catching the InterruptedException, and not
checking
a
thread dump of the hang up.
Calvin
On 6/1/07, Doug Cutting [EMAIL PROTECTED] wrote:
Calvin Yu wrote:
The problem seems to be with the MapTask's (MapTask.java) sort
progress thread (line #196) not stopping after the sort is completed,
and hence the call to join() (line# 190) never returns
Mark Meissonnier wrote:
Sweet. It works. Thanks
Someone should put it on this wiki page
http://wiki.apache.org/lucene-hadoop/hadoop-0.1-dev/bin/hadoop_dfs
I don't have editing priviledges.
Anyone can create themselves a wiki account and edit pages. Just use
the Login button at the top of
Phantom wrote:
(1) Set my fs.default.name set to hdfs://host:port and also specify it
in the JobConf configuration. Copy my sample input file into HDFS using
bin/hadoop fd -put from my local file system. I then need to specify this
file to my WordCount sample as input. Should I specify this file
What version of Hadoop are you using? On what sort of a cluster? How
big is your dataset?
Doug
moonwatcher wrote:
hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase
Dennis Kubes wrote:
Do we know if this is a hardware issue. If it is possibly a software
issue I can dedicate some resources to tracking down bugs. I would just
need a little guidance on where to start looking?
We don't know. The checksum mechanism is designed to catch hardware
problems.
Pedro Guedes wrote:
For this I need to be able to register new steps in my chain and pass
them to hadoop to execute as a mapreduce job. I see two choices here:
1 - build a .job archive (main-class: mycrawler, submits jobs thru
JobClient) with my new steps and dependencies in the 'lib/'
Eelco Lempsink wrote:
I'm not trying to run it on a cluster though, only on one host with
multiple CPU's. So I guess the local filesystem is shared and therefore
it should be fine.
Yes, that should be fine.
However, If I try with fs.default.name set to file:///tmp/hadoop-test/
still
Eelco Lempsink wrote:
Inspired by
http://www.mail-archive.com/[EMAIL PROTECTED]/msg02394.html
I'm trying to run Hadoop on multiple CPU's, but without using HDFS.
To be clear: you need some sort of shared filesystem, if not HDFS, then
NFS, S3, or something else. For example, the job client
Please use a new subject when starting a new topic.
jafarim wrote:
Sorry if being off topic, but we experienced a very low bandwidth with
hadoop while copying files to/from the cluster (some 1/100 comparing to
plain samba share). The bandwidth did not improve at all by adding nodes to
the
Ken Krugler wrote:
Has anybody been using Hadoop with ZFS? Would ZFS count as a readily
available shared file system that scales appropriately?
Sun's ZFS? I don't think that's distributed, is it? Does it provide a
single namespace across an arbitrarily large cluster? From the
jafarim wrote:
On linux and jvm6 with normal IDE disks and a giga ethernet switch with
corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by
using the native libs provided in the package but then we tested again with
distcp. The scenario was as follows:
We ran the test on a
Andy Liu wrote:
I'm exploring the possibility of using the Hadoop records framework to
store
these document records on disk. Here are my questions:
1. Is this a good application of the Hadoop records framework, keeping in
mind that my goals are speed and scalability? I'm assuming the answer
Konstantin Shvachko wrote:
200 bytes per file is theoretically correct, but rather optimistic :-(
From a real system memory utilization I can see that HDFS uses 1.5-2K
per file.
And since each real file is internally represented by two files (1 real
+ 1 crc) the real
estimate per file should
Johan Oskarsson wrote:
Any advice on how to solve this problem?
I think your current solutions sound reasonable.
Would it be possible to somehow share a hashmap between tasks?
Not without running multiple tasks in the same JVM. We could implement
a mode where child tasks are run directly
Gu wrote:
How can I use in some case MultithreadedMapRunner, and in some case
MapRunner for different jobs?
Use JobConf#setMapRunnerClass() on jobs that you want to override the
default MapRunner, with, e.g. MultithreadedMapRunner.
Do I have to use one hadoop-site.xml for one job? But I
Andrzej Bialecki wrote:
It's possible to use Hadoop DFS to host a read-only Lucene index and use
it for searching (Nutch has an implementation of FSDirectory for this
purpose), but the performance is not stellar ...
Right, the best practice is to copy Lucene indexes to local drives in
order
Tom White wrote:
Any what do people think of the following. We already have a bunch of
stuff up in S3 that we'd like to use as input to a hadoop mapreduce job
only it wasn't put there by hadoop so it doesn't have the hadoop format
where file-is-actually-a-list-of-blocks. [ ... ]
The best
Tom White wrote:
This sounds like a good plan. I wonder whether the existing
block-based s3 scheme should be renamed (as s3block or similar) so s3
is the scheme that sores raw files as you describe?
Perhaps s3fs would be best for the full FileSystem implementation, and
simply s3 for direct
Shannon -jj Behrens wrote:
The default JAVA_HOME in hadoop-env.sh is /usr/bin/java. This is
confusing because /usr/bin/java is a binary, not a directory. On my
system, this resulted in:
$ hadoop namenode -format
/usr/local/hadoop-install/hadoop/bin/hadoop: 122:
/usr/bin/java/bin/java: not
Can you please file a bug in Jira for this?
https://issues.apache.org/jira/browse/HADOOP
Select CREATE NEW ISSUE. Create yourself a Jira account if you don't
already have one.
Thanks,
Doug
Shannon -jj Behrens wrote:
I'm using Hadoop on Ubuntu 6.10. I ran into:
$ start-all.sh
starting
Shannon -jj Behrens wrote:
There's no link to http://wiki.apache.org/lucene-hadoop/HadoopStreaming on
http://wiki.apache.org/lucene-hadoop/. It would be really nice if
there were one.
Please add one. Anyone can help maintain the wiki. Simply create
yourself an account and edit the page.
Jagadeesh wrote:
Over the past day we have managed to migrate our clusters from 0.7.2 to
0.9.0.
Thanks for sharing your experiences.
Please note that there is now a 0.9.2 release. There should be no
compatibility issues upgrading from 0.9.0 to 0.9.2, and a number of bugs
are fixed, so I
Owen O'Malley wrote:
I think Hadoop is pronounced as h a: - d u: p with the emphasis on the
second syllable.
(key: http://en.wikipedia.org/wiki/IPA_chart_for_English)
I believe the first vowel there is properly ae (as in cat), but in
rapid speech this unstressed vowel turns to a schwa, so
Albert Chern wrote:
Every time the size of the map file hits a multiple of the index
interval, an index entry is written. Therefore, it is possible that
an index entry is not added for the first occurrence of a key, but one
of the later ones. The reader will then seek to one of those instead
Brendan Melville wrote:
in hadoop-site.xml I had mapred.map.tasks and mapred.reduce.tasks set.
Right, these parameters should be specified in mapred-default.xml, so
that they do not override application code. This is a common confusion.
Someday we should perhaps alter the configuration
howard chen wrote:
2006-11-07 21:53:35,492 ERROR org.apache.hadoop.mapred.TaskTracker:
Can not start task tracker because java.lang.RuntimeException: Bad
mapred.job.tracker: local
To run distributed, you must configure mapred.job.tracker and
fs.default.name to be a host:port pairs on all
Feng Jiang wrote:
look at the code:
job.setNumReduceTasks(1); // force a single reduce task
why? Is there any difficulty there to allow multiple reduce tasks?
There is not a strong reason why a single reduce task is required. This
code attempts to implement things as simply
howard chen wrote:
but when I stop-all --config...it show...
no jobtracker to stop
serverA: Login Success!
serverB: Login Success!
serverB: no tasktracker to stop
It looks like the tasktracker crashed on startup. Login to ServerB and
look in its logs to see what happened.
Doug
1 - 100 of 133 matches
Mail list logo