Re: Can I share datas for several map tasks?

jason hadoop Tue, 16 Jun 2009 21:41:06 -0700

I do not believe it is possible to reuse a jvm for a different task with the
stock installation.
There are a number of reasons, classpath being one, that would make it
difficult to work in the general case.
In fact, as a general rule, I go out of my way to ensure I get a fresh jvm
per task to avoid complications from prior tasks, as I used to have jobs
that had a buggy jni partner that would tie up memory and prevent the jvm
from exiting when main returned. This would fairly quickly use all of the
ram on my Tasktracker nodes and take them out of the cluster.


You might try memory mapping a file that has your data in a form that is
usable directly, or if your input data items are small, feed it to a
separate jvm that you start early, and do your processing in that jvm.

On Tue, Jun 16, 2009 at 9:06 PM, Iman E <hadoop_...@yahoo.com> wrote:

> Thank you, Jason. I found the example. So, is there a way to share the same
> JVM between different jobs?
>
>
>
>
> ________________________________
> From: jason hadoop <jason.had...@gmail.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, June 16, 2009 7:22:16 PM
> Subject: Re: Can I share datas for several map tasks?
>
> in the example code, download bundle, in the package
> com.apress.hadoopbook.examples.advancedtechniques, is the class
> JVMReuseAndStaticInitializers.java
>
> which demonstrates sharing data between instances using jvm reuse.
>
> I built this to prove to myself that it was possible.
> It never got an actual write up in the book itself.
>
> On Tue, Jun 16, 2009 at 6:55 PM, Hello World <snowlo...@gmail.com> wrote:
>
> > I can't get your book, so can you give me a few more words to describe
> the
> > solution? very appreciate.
> >
> > -snowloong
> >
> > On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop <jason.had...@gmail.com
> > >wrote:
> >
> > > In the examples for my book is a jvm reuse with static data shared
> > between
> > > jvm's example
> > >
> > > On Tue, Jun 16, 2009 at 1:08 AM, Hello World <snowlo...@gmail.com>
> > wrote:
> > >
> > > > Thanks for your reply. Can you do me a favor to make a check?
> > > > I modified mapred-default.xml as follows:
> > > >    540 <property>
> > > >    541  <name>mapred.job.reuse.jvm.num.tasks</name>
> > > >    542  <value>-1</value>
> > > >    543  <description>How many tasks to run per jvm. If set to -1,
> > there
> > > is
> > > >    544  no limit.
> > > >    545  </description>
> > > >    546 </property>
> > > > And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
> > > >
> > > > This is my program:
> > > >
> > > >    17 public class WordCount {
> > > >    18
> > > >    19  public static class TokenizerMapper
> > > >    20        extends Mapper<Object, Text, Text, IntWritable>{
> > > >    21
> > > >    22    private final static IntWritable one = new IntWritable(1);
> > > >    23    private Text word = new Text();
> > > >    24    public static int[] ToBeSharedData = new int[1024 * 1024 *
> > > 16];
> > > >    25
> > > >    26    protected void setup(Context context
> > > >    27            ) throws IOException, InterruptedException {
> > > >    28        //Init shared data
> > > >    29        ToBeSharedData[0] = 12345;
> > > >    30        System.out.println("setup shared data[0] = " +
> > > > ToBeSharedData[0]);
> > > >    31    }
> > > >    32
> > > >    33    public void map(Object key, Text value, Context context
> > > >    34                    ) throws IOException, InterruptedException {
> > > >    35      StringTokenizer itr = new
> > StringTokenizer(value.toString());
> > > >    36      while (itr.hasMoreTokens()) {
> > > >    37        word.set(itr.nextToken());
> > > >    38        context.write(word, one);
> > > >    39      }
> > > >    40      System.out.println("read shared data[0] = " +
> > > > ToBeSharedData[0]);
> > > >    41    }
> > > >    42  }
> > > >
> > > > First, can you tell me how to make sure "jvm reuse" is taking effect,
> > for
> > > I
> > > > didn't see anything different from before. I use "top" command under
> > > linux
> > > > and see the same number of java processes and same memory usage.
> > > >
> > > > Second, can you tell me how to make the "ToBeSharedData" be inited
> only
> > > > once
> > > > and can be read from other MapTasks on the same node? Or this is not
> a
> > > > suitable programming style for map-reduce?
> > > >
> > > > By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> > > > single-node.
> > > > thanks in advance
> > > >
> > > > On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <
> > shara...@yahoo-inc.com
> > > > >wrote:
> > > >
> > > > >
> > > > > snowloong wrote:
> > > > > > Hi,
> > > > > > I want to share some data structures for the map tasks on a same
> > > > node(not
> > > > > through files), I mean, if one map task has already initialized
> some
> > > data
> > > > > structures (e.g. an array or a list), can other map tasks share
> these
> > > > > memorys and directly access them, for I don't want to reinitialize
> > > these
> > > > > datas and I want to save some memory. Can hadoop help me do this?
> > > > >
> > > > > You can enable jvm reuse across tasks. See
> > > mapred.job.reuse.jvm.num.tasks
> > > > > in mapred-default.xml for usage. Then you can cache the data in a
> > > static
> > > > > variable in your mapper.
> > > > >
> > > > > - Sharad
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > http://www.apress.com/book/view/9781430219422
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>
>
>
>
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Can I share datas for several map tasks?

Reply via email to