hey
you can surely do that using MulipleFileOutputFormat. We have already
implemented that
Pankil
On Fri, Apr 24, 2009 at 8:58 PM, Aaron Kimball aa...@cloudera.com wrote:
Alternatively, just use FileSystem.rename() on the normal output files
after
reducing is complete?
On Sat, Apr 25,
Amit,
This can be made to work with Hadoop. Basically, in your mapper's
configure stage it would do the heavy load-in process, then it would
process your individual work items as records during the actual map stage.
A map task can be comprised of many records, so you'll be fine here.
If you use
Thanks Aaron,
The processing libs that we use, which take time to load are all c++ based
.so libs.
Can i invoke it from JVM during the configure stage of the mapper and keep
it running as you suggested ?
Can you point me to some documentation regarding the same ?
Regards,
Amit
On Sat, Apr 25,
Hello,
I have an intensive job running across 5 machines. During the map
stage, each map emits 200 records, so effectively for 50,000,000 input
reords, the map creates 200*50e6 records.
However, after a long time, I see two trackers are blacklisted
Caused by: java.lang.RuntimeException: Could not
Hello,
Is there a conf variable for getting the filename to which the current
mapper's key,value belongs to?
I have dir/dirA/part-X and dir/dirB/part-X
i will process dir, but need to know whether the key,value is from
dirA/part-* file or from a dirB/part-* file.
I'd much rather not implement my
Hi,
I have faced somewhat a similar issue...
i have a couple of map reduce jobs running on EC2... after a week or so,
i get a no space on device exception while performing any linux command...
so end up shuttin down hadoop and hbase, clear the logs and then restart
them.
is there a cleaner
For this purpose, I have written my own InputFormat class but I believe
there is a better way of doing that. JobCong may provide information of
input file.
On Sat, Apr 25, 2009 at 12:19 PM, Saptarshi Guha
saptarshi.g...@gmail.comwrote:
Hello,
Is there a conf variable for getting the filename
Hi I am working on the hadoop plugin on eclipse in Linux... everything was
working fine when one day hadoop started to ignore any code changes I did in
my project. Instead it just ran an old copy of the code from somewhere.
Looking at the mapred.local folder where the temporary source files are
static, pinned items persist across jvm reuse.
On Sat, Apr 25, 2009 at 6:44 AM, amit handa amha...@gmail.com wrote:
Thanks Aaron,
The processing libs that we use, which take time to load are all c++ based
.so libs.
Can i invoke it from JVM during the configure stage of the mapper and keep
Hi, there,
My hadoop system (version: 0.18.3) works well under standalone and
pseudo-distributed operation. But if I try to run hadoop in
fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in fact,
the jobTracker and TaskzTracker can be started, but the namenode and
secondary
yes, with the JobConf object try
job.get(map.input.file);
On Sat, Apr 25, 2009 at 12:06 PM, Farhan Husain russ...@gmail.com wrote:
For this purpose, I have written my own InputFormat class but I believe
there is a better way of doing that. JobCong may provide information of
input file.
the parameter you specify for fs.default name should be of the form
hdfs://host:port and the parameter you specify for the mapred.job.tracker
MUST be host:port. I haven't looked at 18.3, but it appears that the :port
is mandatory.
In your case, the piece of code parsing the fs.default.name
If your logs were being written to the root partition (/dev/sda1), that's
going to fill up fast. This partition is always = 10 GB on EC2 and much of
that space is consumed by the OS install. You should redirect your logs to
some place under /mnt (/dev/sdb1); that's 160 GB.
- Aaron
On Sun, Apr
Thanks Aaron.
On Sun, Apr 26, 2009 at 10:37 AM, Aaron Kimball aa...@cloudera.com wrote:
If your logs were being written to the root partition (/dev/sda1), that's
going to fill up fast. This partition is always = 10 GB on EC2 and much of
that space is consumed by the OS install. You should
I'm not aware of any documentation about this particular use case for
Hadoop. I think your best bet is to look into the JNI documentation about
loading native libraries, and go from there.
- Aaron
On Sat, Apr 25, 2009 at 10:44 PM, amit handa amha...@gmail.com wrote:
Thanks Aaron,
The
15 matches
Mail list logo