The easy way is to put this initialization into the construction of the map or reduce object. Each map would have a private copy separate from every other private copy, but since maps get called many, many times this construction cost is, on average, small.
On 10/13/07 12:32 AM, "James Yu" <[EMAIL PROTECTED]> wrote: > Ted, > > Thanks for your explanation. > Actually I ran into a coding situation where my map function (or all map > functions in distributed machines) to use (read only in my case) an > ArrayList which I populate according to the content of a file at the > launching of the whole program. I needed to make sure all map functions > (and even reduce functions) can see the same copy of that ArrayList. > What is the proper way to do this? > > --James > > On 10/12/07, Ted Dunning <[EMAIL PROTECTED]> wrote: >> >> >> >> If you can do with read only constants, then you can define static finals >> somewhere or other. They won't really be global, but since you never >> change >> them, that won't matter. >> >> If you just want global status indicators, then look at what the reporter >> provides. >> >> If you really want read/write global variables, then you have a real >> problem. In fact, that is the shared memory emulation problem all over >> again and that is what map-reduce is intended to side step. Such programs >> can often be re-written so that you have an extra map reduce step or you >> have additional input that gets sorted out to the mapper or reducer that >> needs the values. >> >> If you really, really can't restate your program in this fashion, then you >> probably don't have a problem that is suitable for map-reduce. You might >> be >> able to make use of something like hbase to give you database like >> operations, but you may just have different kind of problem. You might be >> surprised at what a wide variety of problems are amenable to map-reduce >> formulation. >> >> What is it that makes you want these global variables? >> >> >> On 10/12/07 5:09 PM, "James Yu" <[EMAIL PROTECTED]> wrote: >> >>> What is the best practice if I DO need to have some global variables >>> accessible to ALL mappers and ALL reducers which are distributed? Is >> there >>> recommendations? >>> >>> -- James >>> >>> On 10/12/07, Owen O'Malley <[EMAIL PROTECTED]> wrote: >>>> >>>> On Oct 11, 2007, at 9:54 PM, James Yu wrote: >>>> >>>>> I put all user global variables in a class I called MyGlobals. >>>> >>>> Since map/reduce is distributed in general, you should be careful of >>>> using global variables. I find it to be better practice to keep all >>>> of the state variables in either the Mapper or Reducer itself to >>>> remind myself that it is _not_ shared between Mappers, Reducers, and >>>> the launching program. >>>> >>>> -- Owen >>>> >> >>
