What is the preferred way of managing multiple configurations.. ie
development, production etc.
Is there someway I can tell hadoop to use a separate conf directory
other than ${hadoop_home}/conf? I think I've read somewhere that one
should just create multiple conf directories and then
On 8/18/10 9:45 PM, Hemanth Yamijala wrote:
Mark,
On Wed, Aug 18, 2010 at 10:59 PM, Markstatic.void@gmail.com wrote:
What is the preferred way of managing multiple configurations.. ie
development, production etc.
Is there someway I can tell hadoop to use a separate conf directory
Where can I find out information on the latest hadoop api? Are the
examples here using the latest?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
I am trying to go through all the example classes bundled w/ hadoop but
I keep seeing the api change so I am unsure of which to
On 8/19/10 7:46 AM, Mark wrote:
Where can I find out information on the latest hadoop api? Are the
examples here using the latest?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
I am trying to go through all the example classes bundled w/ hadoop
but I keep seeing the api
From what I understand the InputSplit is a byte slice of a particular
file which is then handed off to an individual mapper for processing. Is
the size of the InputSplit equal to the hadoop block ie 64/128mb? If
not, what is the size.
Now the RecordReaders takes in bytes from the InputSplit
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Does this mean the input to the reducer should be Text/IntWritable or
the output of the reducer is Text/IntWritable?
What is the inverse of this.. setInputKeyClass/setInputValueClass? Is
this inferred by the
/Exception in thread main
org.apache.hadoop.fs.FileAlreadyExistsException: Output directory
playground/output already exi/sts
Is there anyway to force writing to an existing directory? It's quite
annoying to keep specifiying a seperate output directory on each run..
especially when my task
When I configure my job to use a KeyValueTextInputFormat doesn't that
imply that the key and value to my mapper will be both Text?
I have it set up like this and I am using the default Mapper.class ie
IdentityMapper
- KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
but I
On 8/26/10 7:47 PM, newpant wrote:
Hi, do you use JobConf.setInputFormat(KeyValueTextInputFormat.class) to set
the input format class ? Default input format class is TextInputFormat, and
the Key type is LongWritable, which store offset of lines in the file (in
byte)
if your reducer accept a
Is there a public ivy repo that has the latest hadoop? Thanks
On 8/27/10 9:25 AM, Owen O'Malley wrote:
On Aug 27, 2010, at 8:04 AM, Mark wrote:
Is there a public ivy repo that has the latest hadoop? Thanks
The hadoop jars and poms should be pushed into the central Maven
repositories, which Ivy uses.
-- Owen
I am looking for the latest version
How can I add jars to Hadoops classpath when running MapReduce jobs
for the following situations?
1) Assuming that the jars are local the nodes that running the job.
2) The jobs are only local to the client submitting the job.
I'm assuming I can just jar up all required jobs into the main job
How can I access command line arguments from a Map or Reduce job? Thanks
How should I be creating a new Job instance in 0.21. It looks like
Job(Configuration conf, String jobName) has been deprecated. It looks
like Job(Cluster cluster) is the new way but I'm unsure of how to get a
handle to the current cluster. Can someone advise. Thanks!
On 8/29/10 4:26 PM, Owen O'Malley wrote:
You would need to save the arguments into the Configuration (aka
JobConf) that you create your job with.
-- Owen
Thanks I just realized that.
On 8/29/10 10:38 PM, Amareshwari Sri Ramadasu wrote:
You can use -libjars option.
On 8/29/10 10:59 AM, Markstatic.void@gmail.com wrote:
How can I add jars to Hadoops classpath when running MapReduce jobs
for the following situations?
1) Assuming that the jars are local the nodes that
On 8/30/10 7:38 AM, Mark wrote:
On 8/29/10 10:38 PM, Amareshwari Sri Ramadasu wrote:
You can use -libjars option.
On 8/29/10 10:59 AM, Markstatic.void@gmail.com wrote:
How can I add jars to Hadoops classpath when running MapReduce jobs
for the following situations?
1) Assuming
I have a question regarding outputting Writable objects. I thought all
Writables know how to serialize themselves to output.
For example I have an ArrayWritable of strings (or Texts) but when I
output it to a file it shows up as
'org.apache.hadoop.io.arraywrita...@21f7186f'
Am I missing
On 8/31/10 10:04 AM, Steve Hoffman wrote:
That is the default 'toString()' output of any Java object. If you
want your custom Writable to print something different you have to
override the toString() method.
Steve
On Tue, Aug 31, 2010 at 11:58 AM, Markstatic.void@gmail.com wrote:
I
On 8/31/10 10:07 AM, David Rosenstrauch wrote:
On 08/31/2010 12:58 PM, Mark wrote:
I have a question regarding outputting Writable objects. I thought all
Writables know how to serialize themselves to output.
For example I have an ArrayWritable of strings (or Texts) but when I
output
On 9/1/10 11:28 PM, Lance Norskog wrote:
Wait- you want a print-to-user method or a 'serialize/deserialize' method?
On Tue, Aug 31, 2010 at 2:42 PM, David Rosenstrauchdar...@darose.net wrote:
On 08/31/2010 02:09 PM, Mark wrote:
On 8/31/10 10:07 AM, David Rosenstrauch wrote:
On 08/31/2010
I am trying to configure out distributed Hadoop setup but for some
reason my datanodes can not connect to the namenode.
2010-09-06 04:06:05,040 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /10.101.102.11:9000. Already tried 0 time(s).
2010-09-06 04:06:06,041 INFO
Have any idea on what conf that may be? I am able to ssh into the
namenode from the datanodes though.
Looks like the name node is starting the web-server on 0.0.0.0. Is this
something I should change?
2010-09-06 03:13:12,599 INFO org.mortbay.log: Started
pinging the master-hadoop from the datanode resolves fine however if I
try to curl master-hadoop:50070 from the datanodes I get curl: (7)
couldn't connect to host
On 9/6/10 11:26 AM, Ranjib Dey wrote:
ya, firewall is the most common issue. apart from that take a look at the
namenode log
FYI I am running fedora 9. Anything I need to configure to allow access?
On 9/6/10 11:33 AM, Mark wrote:
pinging the master-hadoop from the datanode resolves fine however if
I try to curl master-hadoop:50070 from the datanodes I get curl: (7)
couldn't connect to host
On 9/6/10 11:26 AM
Nevermind. It was the default Fedora firewall that was configured.
Disabled it and everything worked as it should.
Thanks for the help
On 9/6/10 11:36 AM, Mark wrote:
FYI I am running fedora 9. Anything I need to configure to allow access?
On 9/6/10 11:33 AM, Mark wrote:
pinging
How do I go about uploading content from a remote machine to the
hadoop cluster? Do I have to first move the data to one of the nodes and
then do a fs -put or is there some client I can use to just access an
existing cluster?
Thanks
Thanks.. ill give that a try
On 9/6/10 2:02 PM, Harsh J wrote:
Java: You can use a DFSClient instance with a proper config object
(Configuration) from right about anywhere - basically all that matters is
the right fs.default.name value, which is your namenode's communication
point.
Can even
In a small cluster (4 machines) with not many jobs can a Namenode be
configured as a Datanode so it can assist in our MR tasks.
If so how is this configured? Is it just simply marking it as such in
the slaves file? Thanks
Thanks
On 9/7/10 9:16 AM, abhishek sharma wrote:
On Tue, Sep 7, 2010 at 12:09 PM, Markstatic.void@gmail.com wrote:
In a small cluster (4 machines) with not many jobs can a Namenode be
configured as a Datanode so it can assist in our MR tasks.
If so how is this configured? Is it just
I am getting the following errors from my datanodes when I start the
namenode.
2010-09-08 14:17:40,690 INFO org.apache.hadoop.ipc.RPC: Server at
hadoop1/10.XXX.XXX.XX:9000 not available yet, Z...
2010-09-08 14:17:42,690 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server:
Fedora actually
On 9/9/10 2:20 AM, Steve Loughran wrote:
On 09/09/10 06:28, Mark wrote:
I am getting the following errors from my datanodes when I start the
namenode.
2010-09-08 14:17:40,690 INFO org.apache.hadoop.ipc.RPC: Server at
hadoop1/10.XXX.XXX.XX:9000 not available yet, Z...
2010
Wait what?
On 9/8/10 10:34 PM, Adarsh Sharma wrote:
Mark just write down the following lines in /etc/hosts file
Ip-address hostname
e. g. 192.168.0.111 ws-test
192.1685.0.165 rahul
samr for all nodes
Mark wrote:
I am getting the following errors from my datanodes when I start
nm.. got it
On 9/9/10 9:55 AM, Mark wrote:
Wait what?
On 9/8/10 10:34 PM, Adarsh Sharma wrote:
Mark just write down the following lines in /etc/hosts file
Ip-address hostname
e. g. 192.168.0.111 ws-test
192.1685.0.165 rahul
samr for all nodes
Mark wrote:
I am getting
How would I connect remotely to HDFS as a different user? ie, in my
machine im logged in as mark but I want to log in as root.
Thanks
Thats it. Thanks
On 9/9/10 12:59 PM, Mithila Nagendra wrote:
Try sudo instead.
On Thu, Sep 9, 2010 at 12:25 PM, Markstatic.void@gmail.com wrote:
How would I connect remotely to HDFS as a different user? ie, in my
machine im logged in as mark but I want to log in as root.
Thanks
How would I go about gracefully stopping/aborting a job?
I am trying to run a mahout example which consists of multiple jobs
(4). If I put the process in the background and log off the machine then
Hadoop will only finish the job it is currently running. Is there a way
around this?
Thanks
If I submit a jar that has a lib directory that contains a bunch of
jars, shouldn't those jars be in the classpath and available to all nodes?
The reason I ask this is because I am trying to submit a jar myjar.jar
that has the following structure
--src
\ (My source classes)
-- lib
\
I dont know? I'm running in a fully distributed environment.. ie not
local or psuedo.
On 9/10/10 12:03 PM, Allen Wittenauer wrote:
On Sep 10, 2010, at 11:53 AM, Mark wrote:
If I submit a jar that has a lib directory that contains a bunch of jars,
shouldn't those jars be in the classpath
If I deploy 1 jar (that contains a lib directory with all the required
dependencies) shouldn't that jar be inherently be distributed to all the
nodes?
On 9/10/10 2:49 PM, Mark wrote:
I dont know? I'm running in a fully distributed environment.. ie not
local or psuedo.
On 9/10/10 12:03 PM
As the subject implies I am trying to dump Cassandra rows into Hadoop.
What is the easiest way for me to accomplish this? Thanks.
Should I be looking into pig for something like this?
I am trying to run PFPGrowth but I keep receiving this Java heap space
error at the end of the first step/beginning of second step.
I am using the following parameters: -method mapreduce -regex [\\t]
-s 5 -g 55000
Output:
..
10/11/11 08:12:56 INFO mapred.JobClient: map 100% reduce
Does anyone have any thoughts/experiences on running Hadoop in AWS? What
are some pros/cons?
Are there any good AMI's out there for this?
Thanks for any advice.
When I load a file from HDFS into hive i notice that the original file
has been removed. Is there anyway to prevent this? If not, how can I got
back and dump it as a file again? Thanks
Exactly what I was looking for. Thanks
On 12/14/10 8:53 PM, 김영우 wrote:
Hi Mark,
You can use 'External table' in Hive.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDLHive external table
does not move or delete files.
- Youngwoo
2010
Can someone explain what partitioning is and why it would be used..
example? Thanks
I'm running the Mahout Frequent Pattern Mining Job
(org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver) and I keep receiving
the following:
Caused by: java.io.IOException: Exceeded max jobconf size: 94278797
limit: 524288
Can someone explain the cause of this and more importantly the
How should I be configuring the heap and memory settings for my cluster?
Currently the only settings we use are:
HADOOP_HEAPSIZE=8192 (in hadoop-env.sh)
mapred.child.java.opts=8192 (in mapred-site.xml)
I have a feeling they settings are completely off. The only reason we
increased it this
I know the master node is responsible for namenode and job tracker, but
other than that is there any data stored on that machine? Basically what
I am asking is should there be an generous amount of free space on that
machine?
So for example I have a large drive I want to swap out of my master
Ok thanks for the clarification.
Just to be sure though..
- The master will have the ${dfs.name.dir} but not ${dfs.data.dir}
- The nodes will have ${dfs.data.dir} but not ${dfs.name.dir}
Is that correct?
On 3/16/11 10:43 AM, Harsh J wrote:
NameNode and JobTracker do not require a lot of
Sorry if this is not the correct list to post this on, it was the
closest I could find.
We are using a taildir('/var/log/foo/') source on all of our agents. If
this agent goes down and data can not be sent to the collector for some
time, what happens when this agent becomes available again?
Sorry but it doesn't look like Chukwa mailing list exists anymore?
Is there an easy way to set up lightweight agents on cluster of machines
instead of downloading the full Chukwa source (+50mb)?
Has anyone build separate RPM's for the agents/collectors?
Thanks
Whats the deal with Chukwa? Mailing list doesn't look like its alive as
well as any of the download options???
http://www.apache.org/dyn/closer.cgi/incubator/chukwa/
Is this project dead?
is the primary mission
of chukwa.
If you are looking for data import, what about Flume?
On Sun, Mar 20, 2011 at 9:59 AM, Mark static.void@gmail.com
mailto:static.void@gmail.com wrote:
Thanks but we need Chukwa to aggregate and store files from across
our app servers into Hadoop. Doesn't
Is Chukwa primarily used for analytics or log aggregation. I thought it
was the latter but it seems more and more its like the former.
On 3/21/11 8:27 AM, Eric Yang wrote:
Chukwa is waiting on a official release of Hadoop and HBase which
works together. In Chukwa trunk, Chukwa is using HBase
How can I tell my job to include all the subdirectories and their
content of a certain path?
My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I
tried setting my input path to 'logs/' using
FileInputFormat.addInputPath however I keep receiving the following error:
Ok so the behavior is a little different when using
FileInputFormat.addInputPath
as opposed to using pig. Ill try the glob.
Thanks
On 4/6/11 8:41 AM, Robert Evans wrote:
I believe that opening a directory as a file will result in a file not found.
You probably need to set it to a glob,
If I have some jars that I would like including into ALL of my jobs is
there some shared lib directory on HDFS that will be available to all of
my nodes? Something similar to how Oozie uses a shared lib directory
when submitting workflows.
As of right now I've been cheating and copying these
How would I go backing up our namenode data? I set up the
secondarynamenode on a separate physical machine but just as a backup I
would like to save the namenode data.
Thanks
?
Thanks
On 6/5/11 12:44 AM, sulabh choudhury wrote:
Hey Mark,
If you add more than one directory (comma separated) in the variable
dfs.name.dir it would automatically be copied in all those location.
On Sat, Jun 4, 2011 at 10:14 AM, Markstatic.void@gmail.com wrote:
How would I go backing up
We have a small 4 node clusters that have 12GB of ram and the cpus are
Quad Core Xeons.
I'm assuming the defaults aren't that generous so what are some
configuration changes I should make to take advantage of this hardware?
Max map task? Max reduce tasks? Anything else?
Thanks
Is there anyway I can write out the results of my mapreduce job into 1
local file... ie the opposite of getmerge?
Thanks
I recently added the following to my core-site.xml
property
nameio.compression.codecs/name
value
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec
/value
/property
However when I try and test a simple MR job I am
That did it. Thanks
On 10/31/11 12:52 PM, Joey Echeverria wrote:
Try getting rid of the extra spaces and new lines.
-Joey
On Mon, Oct 31, 2011 at 1:49 PM, Markstatic.void@gmail.com wrote:
I recently added the following to my core-site.xml
property
nameio.compression.codecs/name
value
We will be adding more memory into our master node in the near future.
We generally don't mind if our map/reduce jobs are unable to run for a
short period but we are more concerned about the impact this may have on
our HBase cluster. Will HBase continue to work will hadoops name-node
and/or
Maybe it was slow for me because I was writing from file system to HDFS, but
now that I am using Amazon's MR, it will be OK.
Thank you,
Mark
On Fri, Jul 24, 2009 at 3:19 PM, Owen O'Malley omal...@apache.org wrote:
On Jul 24, 2009, at 1:15 PM, Mark Kerzner wrote:
SequenceFileOutputFormat
MultipleTextOutputFormat to accomplish this?
Thank you,
Mark
Now I am trying to do this:
Open a ZipOutputStream in the static part of the Reducer, such as in
configure(), then keep writing to this stream. I see too potential problems:
cleanup in case of failure - I saw this discussed - and I don't know when to
close the stream.
Thank you,
Mark
On Sun, Jul
Thank you, worked like a charm
On Tue, Jul 28, 2009 at 12:33 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Not safely. But you can use the close() method to do the same thing.
On Mon, Jul 27, 2009 at 8:21 AM, Mark Kerzner markkerz...@gmail.com
wrote:
Can I use this side effect to
close
Worked great.Thank you
On Tue, Jul 28, 2009 at 7:44 AM, Jason Venner jason.had...@gmail.comwrote:
You close the zip stream in the close method of the reducer.
You will get an error if no data has been written to the stream.
On Sun, Jul 26, 2009 at 9:35 PM, Mark Kerzner markkerz...@gmail.com
if this is stable.
What am I missing in my understanding?
Thank you,
Mark
I tried it in the config file and in the code, and I still get the Reduce
code executed multiple times. Maybe it is just a hint to the system?
Thank you,
Mark
On Wed, Jul 29, 2009 at 9:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Wed, Jul 29, 2009 at 12:58 AM, Mark Kerznermarkkerz
Hi,
what is the preferred way of passing parameters to my job? I would want to
put them all in a properties file and pass that file as an argument, after
input and output. But I am not sure if this is recommended.
Thank you,
Mark
Hi,
when I run either on EC2 or inside the NetBeans IDE, I get a lot of logs.
But when I execute the same job from the command line on my machine, the
_logs in HDFS is empty. Do I need to set some switch?
Thank you,
Mark
Perfect, that's where it was!
Mark
On Mon, Aug 24, 2009 at 9:59 PM, Arvind Sharma arvind...@yahoo.com wrote:
most of the user level log files goes under $HADOOP_HOME/logs/userlog...try
there
Arvind
From: Mark Kerzner markkerz...@gmail.com
To: core
/attempt
folders.
Regards,Sanjay
Mark Kerzner-2 wrote:
Hi,
when I run Hadoop in pseudo-distributed mode, I can't find the log which
System.out.println() goes.
When I run in the IDE, I see it. When I run on EC2, it's part of the
output
logs. But here - do I need to set
Hi, guys,
Pregel has been revealed on 8/11, what is your opinion of, does anybody know
how to get the presentation, and is anyone interested in implementing it?
Thank you,
Mark
Then we should think of a name and create the project somewhere. Does not
have to be the same place as Hadoop, can be Google code to start with...
How about
Madoop
Mississippi (221 bridges)
Danube (lotsa bridges)
Mark
On Thu, Sep 3, 2009 at 2:53 PM, Amandeep Khurana ama...@gmail.com wrote
Brenta
http://en.wikipedia.org/wiki/Brenta_(river)
On Thu, Sep 3, 2009 at 3:07 PM, Mark Kerzner markkerz...@gmail.com wrote:
Then we should think of a name and create the project somewhere. Does not
have to be the same place as Hadoop, can be Google code to start with...
How about
Madoop
, Sep 3, 2009 at 1:08 PM, Mark Kerzner markkerz...@gmail.com
wrote:
Brenta
http://en.wikipedia.org/wiki/Brenta_(river)
http://en.wikipedia.org/wiki/Brenta_%28river%29
On Thu, Sep 3, 2009 at 3:07 PM, Mark Kerzner markkerz...@gmail.com
wrote:
Then we should think of a name and create
like the name Hamburg, but I could live with that.
Mark
On Thu, Sep 3, 2009 at 6:36 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Hamburg has been excessively stable for some time. If you want to do
something, I would recommend contributing to Mahout.
On Thu, Sep 3, 2009 at 3:51 PM, Ashutosh
Hi,
I have some code that's common between the main class, mapper, and reducer.
Can I put it only in the main class and use it from mapper and reducer?
A similar question about static variables in the main - are the available
from mapper and reducer?
Thank you,
Mark
Ah, but it says about PowerSet that it runs on Amazon, so why would they pay
extra for their own Windows and moreover, why would they use Windows given a
choice of Linux? My personal opinion is that Microsoft has a pact with Yahoo
only so that they would have an official excuse to use
I didn't see anything about this in the archive, so perhaps I'm doing
something wrong, but I have run into a problem creating a job with the
.20 release without using the deprecated JobConf class.
The mapreduce.JobContext class is the replacement for the deprecated
mapred.JobContext, but it
() and
getCurrentValue()?
Thank you so much!
Mark Vigeant
RiskMetrics Group, Inc.
Hi,
the strings I am writing in my reducer have characters that may present a
problem, such as char represented by decimal 254, which is hex FE. It seems
that instead I see hex C3, or something else is messed up. Or my
understanding is messed up :)
Any advice?
Thank you,
Mark
Thanks, that is a great answer.
My problem is that the application that reads my output accepts a
comma-separated file with extended ASCII delimiters. Following your answer,
however, I will try to use low-value ASCII, like 9 or 11, unless someone has
a better suggestion.
Thank you,
Mark
On Fri
Thanks again, Todd. I need two delimiters, one for comma and one for quote.
But I guess I can use ^A for quote, and keep the comma as is, and I will be
good.
Sincerely,
Mark
On Mon, Oct 12, 2009 at 10:15 PM, Todd Lipcon t...@cloudera.com wrote:
Hey Mark,
The most commonly used delimiter
do - but does it work with Hadoop?
Thank you,
Mark
Thank you, all. It looks like SimpleDB may be good enough for my needs. The
forums claim that you can write to it from all reducers at once, being that
it is highly optimized for concurrent access.
On Tue, Oct 13, 2009 at 5:30 PM, Jeff Hammerbacher ham...@cloudera.comwrote:
Hey Mark,
You
different AMI (OpenSolaris 2009.06 just to be very
different). terminate-cluster only listed the 6 instances that were
part of the cluster if I remember correctly.
I have 4 security groups: default, default-master, default-slave, and
mark-default. mark-default wasn't even added until after I started
Hi,
I need to number all output records consecutively, like, 1,2,3...
This is no problem with one reducer, making recordId an instance variable in
the Reducer class, and setting conf.setNumReduceTasks(1)
However, it is an architectural decision forced by processing need, where
the reducer
Aaron, although your notes are not a ready solution, but they are a great
help.
Thank you,
Mark
On Tue, Oct 27, 2009 at 11:27 PM, Aaron Kimball aa...@cloudera.com wrote:
There is no in-MapReduce mechanism for cross-task synchronization. You'll
need to use something like Zookeeper
about the idea of
JavaSpaces - I am going to check out this one.
Mark
On Wed, Oct 28, 2009 at 12:57 PM, Michael Klatt michael.kl...@gmail.comwrote:
Hi Mark,
Each mapper (or reducer) has an environment variable mapred_map_tasks (or
mapred_reduce_tasks) which will describe how many tasks
assume that reducers are sorted. For example, if my
records are sorted 1,2,...6, then one reducer would get maps 1,2,3, and the
other one - maps 4,5,6. If that's the case, I need to know how the reducers
are sorted. Then I could simply run the second stage.
Thank you,
Mark
On Wed, Oct 28
changes or new software - for, as you say, it might become a pain. Follow
the rule (which I got from Scott Meyers' Effective C++) - avoid premature
optimization.
mark
On Wed, Oct 28, 2009 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Wed, Oct 28, 2009 at 2:20 PM, Mark Kerzner markkerz
Ok, thank you very much Amogh, I will redesign my program.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Monday, November 02, 2009 11:45 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths
Mark,
Set-up for a mapred job consumes
@hadoop.apache.org
Subject: Re: Multiple Input Paths
Hi Mark,
A future release of Hadoop will have a MultipleInputs class, akin to
MultipleOutputs. This would allow you to have a different inputformat, mapper
depending on the path you are getting the split from. It uses special
Delegating[mapper
of
doing this? Other areas of concern are:
- Will Amazon EMR work with the latest Hadoop?
- What about Cloudera distribution or Yahoo distribution?
Thank you,
Mark
1 - 100 of 382 matches
Mail list logo