Re: job taking input file, which "is being" written by its preceding job's map phase

Vamshi Krishna Fri, 10 Feb 2012 22:52:16 -0800

Hi harsh, i am trying to find what are all the rowkeys present in two
tables. If userid is the rowKey for two different tables, i want to find
all those rowsKeys present in both thae tables. Fo that i need to read from
two tables into a mapreduce job. i.e i want to take multiple tables as
input to a mapreduce job, so that i can check for the intersection.  How
can i do that?
One more doubt i have is, if two jobs have  Htable=new HTable(config,
"HT"); (HT is the hbasetable i have created) in their respective maps, and
 these two jobs reading from other tables T1,T2 and putting into HT table,
will there be any problem?? can i do like that. Its just like a scenario,
where the data of two tables are being put into a single table, by 2
different jobs. I am getting following errors and jobs are killed
automatically.



  java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
        ... 3 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: HsetSIintermediate
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:725)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:173)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:147)
        at 
Setintersection.SetIntersectionMRFINAL$setIntersectionMapper1.*(SetIntersectionMRFINAL.java:49)*
        ... 8 more


java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
        ... 3 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: HsetSIintermediate
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:725)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:173)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:147)
        at 
Setintersection.SetIntersectionMRFINAL$setIntersectionMapper2.*(SetIntersectionMRFINAL.java:83)
 *

        ... 8 more

                    the errors i bolded corresponds to line
Htable=new HTable(config, "HT");  in both the jobs.


please help..


On Thu, Feb 9, 2012 at 12:45 PM, Harsh J <ha...@cloudera.com> wrote:

> Vamshi,
>
> What problem are you exactly trying to solve by trying to attempt
> this? If you are only interested in records being streamed from one
> mapper into another, why can't it be chained together? Remember that
> map-only jobs do not sort their data output -- so I still see no
> benefit here in consuming record-by-record from a whole new task when
> it could be done from the very same.
>
> Btw, ChainMapper is an API abstraction to run several mapper
> implementations in sequence (chain) for each record input and
> transform them all along (helpful if you have several utility mappers
> and want to build composites). It does not touch disk.
>
> On Thu, Feb 9, 2012 at 12:15 PM, Vamshi Krishna <vamshi2...@gmail.com>
> wrote:
> > thank you harsh for your reply. Here what chainMapper does is, once the
> > first mapper finishes, then only second map starts using that file
> written
> > by first mapper. Its just like chain. But what i want is like pipelining
> i.e
> > after first map starts and before it finishes only second map has to
> start
> > and kepp on reading from the same file that is being written by first
> map.
> > It is almost like produce-consumer like scenario, where first map writes
> in
> > to the file, and second map keeps on reading the same file. So that
> > pipelining effect is seen between two maps.
> > Hope you got what i am trying to tell..
> >
> > please help..
> >
> >
> > On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Vamsi,
> >>
> >> Is it not possible to express your M-M-R phase chain as a simple, single
> >> M-R?
> >>
> >> Perhaps look at the ChainMapper class @
> >>
> >>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html
> >>
> >> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2...@gmail.com>
> >> wrote:
> >> > Hi all
> >> > i have an important question about mapreduce.
> >> >  i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer.
> >> > Job1
> >> > started and in its map() it is writing to a "file1" using
> >> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1)
> ,
> >> > which should take the "file1" (output still being written by above
> job's
> >> > map
> >> > phase) as input and do processing in its own map/reduce phases, and
> job2
> >> > should keep on taking the newly written data to "file1" , untill job1
> >> > finishes, what i should do?
> >> >
> >> > how can i do that, Please can anybody help?
> >> >
> >> > --
> >> > Regards
> >> >
> >> > Vamshi Krishna
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >> Customer Ops. Engineer
> >> Cloudera | http://tiny.cloudera.com/about
> >
> >
> >
> >
> > --
> > Regards
> >
> > Vamshi Krishna
> >
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about
>



-- 
*Regards*
*
Vamshi Krishna
*

Re: job taking input file, which "is being" written by its preceding job's map phase

Reply via email to