Hi Alex, My answers are inline.
2013/2/27 <[email protected]>: > Hi, > > I am mostly interested in fetcher job. In this job I see this code > > StorageUtils.initMapperJob(currentJob, fields, IntWritable.class, > FetchEntry.class, FetcherMapper.class, FetchEntryPartitioner.class, false); > > In storage utils this function has > > DataStore<String, WebPage> store = createWebStore(job.getConfiguration(), > String.class, WebPage.class); > if (store==null) throw new RuntimeException("Could not create datastore"); > Query<String, WebPage> query = store.newQuery(); > query.setFields(toStringArray(fields)); > GoraMapper.initMapperJob(job, query, store, outKeyClass, outValueClass, > mapperClass, partitionerClass, reuseObjects); So what you are doing is that you are starting a MapReduce job which uses the Query Object to get the data out of a specific data store [1]. Therefore, all the magic happens within the GoraMapper code. Look at the initMapperJob method {code} @SuppressWarnings("rawtypes") public static <K1, V1 extends Persistent, K2, V2> void initMapperJob( Job job, Query<K1,V1> query, DataStore<K1,V1> dataStore, Class<K2> outKeyClass, Class<V2> outValueClass, Class<? extends GoraMapper> mapperClass, Class<? extends Partitioner> partitionerClass, boolean reuseObjects) throws IOException { //set the input via GoraInputFormat GoraInputFormat.setInput(job, query, dataStore, reuseObjects); job.setMapperClass(mapperClass); job.setMapOutputKeyClass(outKeyClass); job.setMapOutputValueClass(outValueClass); if (partitionerClass != null) { job.setPartitionerClass(partitionerClass); } } {\code} Then, the method that will continue the work is the GoraInputFormat[2].setInput which then will use the setQuery method to pass this object through the job configuration to all mappers which will then perform the query (yes, the regular query we define to get data from data stores). {code} public static<K, T extends Persistent> void setQuery(Job job , Query<K, T> query) throws IOException { IOUtils.storeToConf(query, job.getConfiguration(), QUERY_KEY); } {\code} > I followed all these functions but did not find actual code that sends query > to hbase table. > I believe it is somewhere in gora-hbase. So once in the Mappers, we will use the query object to perform the data retrieval operation from the specific data store. Hope this helps man. Renato M. > Thanks. > Alex. > [1] http://gora.apache.org/docs/current/apidocs-0.2.1/org/apache/gora/mapreduce/GoraMapper.html#initMapperJob(org.apache.hadoop.mapreduce.Job, org.apache.gora.query.Query, org.apache.gora.store.DataStore, java.lang.Class, java.lang.Class, java.lang.Class, boolean) [2] https://github.com/renato2099/gora/blob/trunk/gora-core/src/main/java/org/apache/gora/mapreduce/GoraInputFormat.java > > > > > -----Original Message----- > From: Renato Marroquín Mog > rovejo <[email protected]> > To: Gora Dev <[email protected]> > Sent: Tue, Feb 26, 2013 8:01 pm > Subject: Re: gora-hbase query > > > Hi Alex, > > The Gora-HBase module is only in charge of querying and persisting > data from any where, not only Nutch. That being said, you want the > part where Nutch populates a map used in different Nutch jobs? Which > jobs are you talking about? Generator? Fetcher? You can probably get > some more lights over in NutchLand. > I am happy to go over the code with you anyways, just please be a > little bit more specific. > > > Renato M. > > 2013/2/26 <[email protected]>: >> >> Hello, >> >> Can someone point me the code in gora-hbase that queries hbase and populates > nutch map key values for varies nutch jobs? >> I plan to use SingleColumnValueFilter to see if it selects only subset of > records. >> >> Thanks. >> Alex. >> >> > >

