Hi,
Can we specify which subset of machines to use for different jobs? E.g. We
set machine A as namenode, and B, C, D as datanodes. Then for job 1, we have
a mapreduce that runs on B C and for job 2, the map-reduce runs on C D.
Regards,
Raakhi
Hey,
If you don't want to wait for the release, you could try using the latest
version of Cloudera's Distribution for Hadoop (see
http://www.cloudera.com/hadoop), which is based off of the 0.18.3 release of
Apache Hadoop but has the HADOOP-1722 patch backported (see
Oh, I see. Thanks.
- Jianmin
From: Sharad Agarwal shara...@yahoo-inc.com
To: core-user@hadoop.apache.org
Sent: Thursday, June 4, 2009 12:59:12 PM
Subject: Re: question about when shuffle/sort start working
Jianmin Woo wrote:
Do you have some sample on the
On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh tarand...@gmail.comwrote:
I want to share a object (Lucene Index Writer Instance) between mappers
running on same node of 1 job (not across multiple jobs). Please correct me
if I am wrong -
If I set the -1 for the property:
Can you give an example of the exact arguments you're sending on the command
line?
- Aaron
On Wed, Jun 3, 2009 at 5:46 PM, Ian Soboroff ian.sobor...@nist.gov wrote:
If after I call getConf to get the conf object, I manually add the
key/value pair, it's there when I need it. So it feels like
e.g. for readFields(),
myItems = new ArrayListString();
int numItems = dataInput.readInt();
for (i = 0; i numItems; i++) {
myItems.add(Text.readString(dataInput));
}
then on the serialization (write) side, send:
dataOutput.writeInt(myItems.length());
for (int i = 0; i myItems.length(); i++)
If you don't add any member fields, then no, I don't think you need to
change anything.
- Aaron
On Wed, Jun 3, 2009 at 4:11 PM, dealmaker vin...@gmail.com wrote:
I have the following as my type of my value object. Do I need to
implement
readfields and write functions?
private static
We're using Lzo still, works great for those big log files:
http://code.google.com/p/hadoop-gpl-compression/
/Johan
Kris Jirapinyo wrote:
Hi all,
In the remove lzo JIRA ticket
https://issues.apache.org/jira/browse/HADOOP-4874 Tatu mentioned he was
going to port fastlz from C to Java and
bin/hadoop jar -files collopts -D prise.collopts=collopts p3l-3.5.jar
gov.nist.nlpir.prise.mapred.MapReduceIndexer input output
The 'prise.collopts' option doesn't appear in the JobConf.
Ian
Aaron Kimball aa...@cloudera.com writes:
Can you give an example of the exact arguments you're
Here's how I solved the problem using a custom InputFormat... the key
part is in listStatus(), where we traverse the directory tree. Since
HDFS doesn't have links this code is probably safe, but if you have a
filesystem with cycles you will get trapped.
Ian
import java.io.IOException;
import
On 6/2/09, Rares Vernica rvern...@gmail.com wrote:
I have a problem getting the map input file name. Here is what I tried:
public class Map extends MapperObject, Text, LongWritable, Text {
public void map(Object key, Text value, Context context)
throws IOException,
If you're case is like mine, where you have lots of .gz files and you
don't want splits in the middle of those files, you can use the code I
just sent in the thread about traversing subdirectories. In brief, your
RecordReader could do something like:
public static class MyRecordReader
Are your tasks failing or completing successfully. Failed tasks have the
output directory wiped, only successfully completed tasks have the files
moved up.
I don't recall if the FileOutputCommitter class appeared in 0.18
On Wed, Jun 3, 2009 at 6:43 PM, Ian Soboroff ian.sobor...@nist.gov wrote:
Hi.
I'm quite new to Hadoop programming, so to get a good start I started
writing my own program that summarizes a column in a large tab separated
file (~100 000 000 lines). My first naive implementation was quite simple, a
small rework of the WordCounter example that comes with Hadoop. This
Is there any documentation on that site on how we can use lzo? I don't see
any entries on the wiki page of the project. I see an entry on the Hadoop
wiki (http://wiki.apache.org/hadoop/UsingLzoCompression) but seems like
that's more oriented towards HBase. I am on hadoop 0.19.1.
Thanks,
Kris
Kris-
You might take a look at some of the previous lzo threads on this list
for help.
See: http://www.mail-archive.com/search?q=lzol=core-user%40hadoop.apache.org
-Matt
On Jun 4, 2009, at 10:29 AM, Kris Jirapinyo wrote:
Is there any documentation on that site on how we can use lzo? I
Hi Raakhi,
Unfortunately there is no built-in way of doing this. You'd have to
instantiate two entirely separate Hadoop clusters to accomplish what you're
trying to do, which isn't an uncommon thing to do.
I'm not sure why you're hoping to have this behavior, but the fair share
scheduler might
Thanks Kevin for the clarification. I ran couple of tests as well and the
system behaved exactly what you had said.
So now the question is, how can I achieve what I want to do - share an
object (Lucene IndexWriter instance) between mappers running on same node. I
thought of running the
I has read your code ,I think you should add
job.setInputFormatClass(MultiLineInputFormat.class);
when you not set the that ,it would use TextInputFormat and the value is
Text default.You may thought
that MultiLineInputFormat.addInputPath() would set the InputFormatClass
auto, but it doesn't do
Thanks Matt. Hopefully we can have a new page on the hadoop wiki on how to
use custom compression so that people won't have to go search through the
threads to find the answer in the future.
On Thu, Jun 4, 2009 at 10:33 AM, Matt Massie m...@cloudera.com wrote:
Kris-
You might take a look at
Hi!
I am working on applying WordCount example on the entire Wikipedia dump. The
entire english wikipedia is around 200GB which I have stored in HDFS in a
cluster to which I have access.
The problem: Wikipedia dump contains many directories (it has a very big
directory structure) containing
Please, add me to the hadoop-core user mailing list.
email address: *akhilan...@gmail.com*
Thank You!
Akhil
Perhaps, there should not be the space between -D and your option ?
-Dprise.collopts=
Vasyl
2009/6/4 Ian Soboroff ian.sobor...@nist.gov:
bin/hadoop jar -files collopts -D prise.collopts=collopts p3l-3.5.jar
gov.nist.nlpir.prise.mapred.MapReduceIndexer input output
The
Hello all,
I'm trying to setup a two node cluster remote using the following
tutorials
{ NOTE : i'm ignoring the tmp directory property in hadoop-site.xml
suggested by Michael }
Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G.
You need to send a message to core-user-subscr...@hadoop.apache.org from the
address you want registered.
See http://hadoop.apache.org/core/mailing_lists.html
- Aaron
On Thu, Jun 4, 2009 at 12:10 PM, Akhil langer akhilan...@gmail.com wrote:
Please, add me to the hadoop-core user mailing list.
Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.
Ian, try putting the options after the classname and see if that
helps. Otherwise, it would be useful to see a snippet
On Jun 4, 2009, at 11:19 AM, Kris Jirapinyo wrote:
Hopefully we can have a new page on the hadoop wiki on how to
use custom compression so that people won't have to go search
through the
threads to find the answer in the future.
Yes, it would be extremely useful if you could start a wiki
From logs looks like your Hadoop cluster is facing two different issues .
At Slave
1. exception: java.net.NoRouteToHostException: No route to host in your logs
Diagnosis - One of your nodes cannot be reached correctly. Make sure you can
ssh to your master and slave and passwordless ssh keys
I did indeed think that addInputPath() set the InputFormat class, so
this is probably what has been my problem. I'll try this when I gain
access to my cluster again on Monday, but I'm fairly confident that this
will fix my program.
Thank you very much for a good answer.
Take care, I will post
Tom White wrote:
Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.
I don't think space is required. Something like
-Dfs.default.name=host:port works. I don't see
I can SSH both ways .i.e. From master to slave and slave to master.
the datanode is getting intialized at master but the log at slave looks like
this
/
2009-06-04 15:20:06,066 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
Did you try 'telnet 198.55.35.229 54310' from this datanode? The log
show that it is not able to connect to master:54310. ssh from datanode
does not matter.
Raghu.
asif md wrote:
I can SSH both ways .i.e. From master to slave and slave to master.
the datanode is getting intialized at
@ Ravi.
Not able to do that.
On Thu, Jun 4, 2009 at 5:38 PM, Raghu Angadi rang...@yahoo-inc.com wrote:
Did you try 'telnet 198.55.35.229 54310' from this datanode? The log show
that it is not able to connect to master:54310. ssh from datanode does not
matter.
Raghu.
asif md wrote:
I
How do I remove a datanode? Do I simply destroy my datanode and the namenode
will automatically detect it? Is there a more elegent way to do it?
Also, when I remove a datanode, does hadoop automatically re-replicate the data
right away?
Thanks,
Harold
It's in the FAQ:
http://wiki.apache.org/hadoop/FAQ#17
Brian
On Jun 4, 2009, at 6:26 PM, Harold Lim wrote:
How do I remove a datanode? Do I simply destroy my datanode and
the namenode will automatically detect it? Is there a more elegent
way to do it?
Also, when I remove a datanode,
I have a couple of questions:
1. Is Hbase 0.19.3 release stable for a production cluster?
2. Can it be deployed over Hadoop v0.19.1?
..amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
Using com.hadoop.compression.lzo.LzoCodec is not much different from
using other codecs: adding the hadoop-gpl-compression-0.1.0-dev.jar in
your classpath, and add the path to the native library
libgplcompression.so in system property java.library.path.
Hope this helps, Hong
On Jun 4,
Hi,
I'm a Hadoop 17 user who is doing research with Prof. Magda Balazinska
at the University of Washington on an improved progress indicator for
Pig Latin. We have a question regarding how Hadoop schedules Pig Latin
queries with JOIN operators. Does Hadoop schedule all MapReduce jobs in
a
Hello Kristi,
I am Research Assistant at University of Texas at Dallas. We are working of
RDF data and we come across many joins in our queries. But We are not able
to carry out all joins in a single job..we also tried our hadoop code using
Pig scripts and found that for each join in PIG script
39 matches
Mail list logo