Hi Alex, I think I understand what you are talking about. As you know the query object is sent to the mappers through the conf object and then used in the GoraRecordReader [2] I think this is the most important class used for this. The class where we query HBase is in [1]. There is a really simple example in [3] about using Gora's MapReduce support but I don't think it will be what you are looking for. You have just pointed out a really important issue here, we should probably create some simple examples on how to use it, if you feel like tackling this would be awesome man! Let's us know if we can help you out working this out.
Renato M. [1] https://github.com/renato2099/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java [2] https://github.com/renato2099/gora/blob/trunk/gora-core/src/main/java/org/apache/gora/mapreduce/GoraRecordReader.java [3] https://github.com/renato2099/gora/blob/trunk/gora-core/src/examples/java/org/apache/gora/examples/mapreduce/QueryCounter.java 2013/3/1 <[email protected]>: > Hi Renato, > > > > > So once in the Mappers, we will use the query object to perform the > data retrieval operation from the specific data store. > Hope this helps man. > > > I need to see the code that does what you specified above. > > The setQuery function calls IOUtils.storeToConf(query, > job.getConfiguration(), QUERY_KEY); > > and > > public static<T> void storeToConf(T obj, Configuration conf, String dataKey) > throws IOException { > String classKey = dataKey + "._class"; > conf.set(classKey, obj.getClass().getCanonicalName()); > DefaultStringifier.store(conf, obj, dataKey); > } > > function simply sets configuration. > > Where is the call to hbase to retrieve data then? > > > Thanks. > Alex. > > > > > > -----Original Message----- > From: Renato Marroquín Mog > rovejo <[email protected]> > To: Gora Dev <[email protected]> > Sent: Thu, Feb 28, 2013 9:47 pm > Subject: Re: gora-hbase query > > > Hi Alex, > > My answers are inline. > > > 2013/2/27 <[email protected]>: >> Hi, >> >> I am mostly interested in fetcher job. In this job I see this code >> >> StorageUtils.initMapperJob(currentJob, fields, IntWritable.class, > FetchEntry.class, FetcherMapper.class, FetchEntryPartitioner.class, false); >> >> In storage utils this function has >> >> DataStore<String, WebPage> store = createWebStore(job.getConfiguration(), > String.class, WebPage.class); >> if (store==null) throw new RuntimeException("Could not create >> datastore"); >> Query<String, WebPage> query = store.newQuery(); >> query.setFields(toStringArray(fields)); >> GoraMapper.initMapperJob(job, query, store, outKeyClass, outValueClass, > mapperClass, partitionerClass, reuseObjects); > > So what you are doing is that you are starting a MapReduce job which > uses the Query Object to get the data out of a specific data store > [1]. Therefore, all the magic happens within the GoraMapper code. Look > at the initMapperJob method > > {code} > @SuppressWarnings("rawtypes") > public static <K1, V1 extends Persistent, K2, V2> void initMapperJob( > Job job, > Query<K1,V1> query, > DataStore<K1,V1> dataStore, > Class<K2> outKeyClass, > Class<V2> outValueClass, > Class<? extends GoraMapper> mapperClass, > Class<? extends Partitioner> partitionerClass, > boolean reuseObjects) throws IOException { > //set the input via GoraInputFormat > GoraInputFormat.setInput(job, query, dataStore, reuseObjects); > > job.setMapperClass(mapperClass); > job.setMapOutputKeyClass(outKeyClass); > job.setMapOutputValueClass(outValueClass); > > if (partitionerClass != null) { > job.setPartitionerClass(partitionerClass); > } > } > {\code} > > Then, the method that will continue the work is the > GoraInputFormat[2].setInput which then will use the setQuery method to > pass this object through the job configuration to all mappers which > will then perform the query (yes, the regular query we define to get > data from data stores). > > {code} > > public static<K, T extends Persistent> void setQuery(Job job > , Query<K, T> query) throws IOException { > IOUtils.storeToConf(query, job.getConfiguration(), QUERY_KEY); > } > > {\code} > >> I followed all these functions but did not find actual code that sends query > to hbase table. >> I believe it is somewhere in gora-hbase. > > So once in the Mappers, we will use the query object to perform the > data retrieval operation from the specific data store. > Hope this helps man. > > > Renato M. > >> Thanks. >> Alex. >> > > [1] > http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/mapreduce/GoraMapper.html#initMapperJob(org.apache.hadoop.mapreduce.Job, > org.apache.gora.query.Query, org.apache.gora.store.DataStore, > java.lang.Class, java.lang.Class, java.lang.Class, boolean) > > [2] > https://github.com/renato2099/gora/blob/trunk/gora-core/src/main/java/org/apache/gora/mapreduce/GoraInputFormat.java >> >> >> >> >> -----Original Message----- >> From: Renato Marroquín Mog >> rovejo <[email protected]> >> To: Gora Dev <[email protected]> >> Sent: Tue, Feb 26, 2013 8:01 pm >> Subject: Re: gora-hbase query >> >> >> Hi Alex, >> >> The Gora-HBase module is only in charge of querying and persisting >> data from any where, not only Nutch. That being said, you want the >> part where Nutch populates a map used in different Nutch jobs? Which >> jobs are you talking about? Generator? Fetcher? You can probably get >> some more lights over in NutchLand. >> I am happy to go over the code with you anyways, just please be a >> little bit more specific. >> >> >> Renato M. >> >> 2013/2/26 <[email protected]>: >>> >>> Hello, >>> >>> Can someone point me the code in gora-hbase that queries hbase and populates >> nutch map key values for varies nutch jobs? >>> I plan to use SingleColumnValueFilter to see if it selects only subset of >> records. >>> >>> Thanks. >>> Alex. >>> >>> >> >> > >

