I am totally stuck here.
I have table which is called url and where are p, and i and s families.
Table url have 8300 rows.
Those rows are inserted like
key:xxxxx columnfamily:p: value:<webpage content>
Now, when I do scan in Hadoop, I add right columnFamily p, and try to
process all those 8300 rows in one map phase (I use multithreaderMapper
-patch, and use synchronize inside mapper etc.) I get only 566 rows (Map
input) and NOT those 8300 rows I am expecting to process.
What could possible be wrong ? I process those inputs in my Mapper as:
public void map(ImmutableBytesWritable row, Result values,Context context)
throws IOException {
for(KeyValue kv : values.raw()){
String i = new String(kv.getRow());
String p = new String(kv.getValue());
do something with p ..
savecontent(save processed things to table url family ilinks)
}
}
To get and idea .. What could be the reason ? I dont use any start or stop
rows ..
Yours,
Petri