Hi harsh, i am trying to find what are all the rowkeys present in two tables. If userid is the rowKey for two different tables, i want to find all those rowsKeys present in both thae tables. Fo that i need to read from two tables into a mapreduce job. i.e i want to take multiple tables as input to a mapreduce job, so that i can check for the intersection. How can i do that? One more doubt i have is, if two jobs have Htable=new HTable(config, "HT"); (HT is the hbasetable i have created) in their respective maps, and these two jobs reading from other tables T1,T2 and putting into HT table, will there be any problem?? can i do like that. Its just like a scenario, where the data of two tables are being put into a single table, by 2 different jobs. I am getting following errors and jobs are killed automatically.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) ... 3 more Caused by: org.apache.hadoop.hbase.TableNotFoundException: HsetSIintermediate at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:725) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HTable.(HTable.java:173) at org.apache.hadoop.hbase.client.HTable.(HTable.java:147) at Setintersection.SetIntersectionMRFINAL$setIntersectionMapper1.*(SetIntersectionMRFINAL.java:49)* ... 8 more java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) ... 3 more Caused by: org.apache.hadoop.hbase.TableNotFoundException: HsetSIintermediate at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:725) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HTable.(HTable.java:173) at org.apache.hadoop.hbase.client.HTable.(HTable.java:147) at Setintersection.SetIntersectionMRFINAL$setIntersectionMapper2.*(SetIntersectionMRFINAL.java:83) * ... 8 more the errors i bolded corresponds to line Htable=new HTable(config, "HT"); in both the jobs. please help.. On Thu, Feb 9, 2012 at 12:45 PM, Harsh J <ha...@cloudera.com> wrote: > Vamshi, > > What problem are you exactly trying to solve by trying to attempt > this? If you are only interested in records being streamed from one > mapper into another, why can't it be chained together? Remember that > map-only jobs do not sort their data output -- so I still see no > benefit here in consuming record-by-record from a whole new task when > it could be done from the very same. > > Btw, ChainMapper is an API abstraction to run several mapper > implementations in sequence (chain) for each record input and > transform them all along (helpful if you have several utility mappers > and want to build composites). It does not touch disk. > > On Thu, Feb 9, 2012 at 12:15 PM, Vamshi Krishna <vamshi2...@gmail.com> > wrote: > > thank you harsh for your reply. Here what chainMapper does is, once the > > first mapper finishes, then only second map starts using that file > written > > by first mapper. Its just like chain. But what i want is like pipelining > i.e > > after first map starts and before it finishes only second map has to > start > > and kepp on reading from the same file that is being written by first > map. > > It is almost like produce-consumer like scenario, where first map writes > in > > to the file, and second map keeps on reading the same file. So that > > pipelining effect is seen between two maps. > > Hope you got what i am trying to tell.. > > > > please help.. > > > > > > On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <ha...@cloudera.com> wrote: > >> > >> Vamsi, > >> > >> Is it not possible to express your M-M-R phase chain as a simple, single > >> M-R? > >> > >> Perhaps look at the ChainMapper class @ > >> > >> > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html > >> > >> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2...@gmail.com> > >> wrote: > >> > Hi all > >> > i have an important question about mapreduce. > >> > i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer. > >> > Job1 > >> > started and in its map() it is writing to a "file1" using > >> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) > , > >> > which should take the "file1" (output still being written by above > job's > >> > map > >> > phase) as input and do processing in its own map/reduce phases, and > job2 > >> > should keep on taking the newly written data to "file1" , untill job1 > >> > finishes, what i should do? > >> > > >> > how can i do that, Please can anybody help? > >> > > >> > -- > >> > Regards > >> > > >> > Vamshi Krishna > >> > > >> > >> > >> > >> -- > >> Harsh J > >> Customer Ops. Engineer > >> Cloudera | http://tiny.cloudera.com/about > > > > > > > > > > -- > > Regards > > > > Vamshi Krishna > > > > > > -- > Harsh J > Customer Ops. Engineer > Cloudera | http://tiny.cloudera.com/about > -- *Regards* * Vamshi Krishna *