Hi St. Ack,
***************************************************************************
1.
Firstly I need to thank you for your last reply, which urged me to re-check
my code, and I did find a stupid problem.
In the map function of my old code I calls
HTable table = new HTable(conf, this.tableName);
RowResult rowResult = table.getRow(key);
which basically means for each row i need to create a new "connection" to
the table. This is awkward!
In my new code I only create one such "connection" during job configuration
phase,
public void configure(JobConf job) {
String tableName = job.get(TABLENAME);
try
{
setTable(job, tableName);
} catch (Exception e) {
LOG.error(e);
}
}
private HTable table;
protected void setTable(final JobConf job, final String tableName)
throws
Exception{
this.table = new HTable(new HBaseConfiguration(job), tableName);
}
and then I just call
RowResult rowResult = this.table.getRow(msgid);
With this revision, the job runs very stable now and takes 110 minutes to
read 10M records.
So for Q1, I can read 1M records in about 11 minutes, this looks ok.
***************************************************************************
2.
I use the default FileInputFormat so yes, the file is split into 26 pieces
(not 32, don't know why) and each mapper processed about 0.31 million
(~1/32nd part of the 10M records).
Yes, all eight boxes are running a regionserver. There are 48 regions in my
table of 10M.
>> When your MR that did A2. below ran, was the 'getting' distributed across
>> the regions of the table or were you banging on single region of the
>> table the whole time?
Where can I check it? Though I think it should go across all regions because
I need to read all 10M records out.
I use Hadoop 0.18.2 and HBase 0.18.1.
Thank for the answer to Q3 too. That is what I will try soon to build a
lucene index and see if searching based on the index can speed up
column-based reading.
--
View this message in context:
http://www.nabble.com/How-to-read-a-subset-of-records-based-on-a-column-value-in-a-M-R-job--tp20963771p21081633.html
Sent from the HBase User mailing list archive at Nabble.com.