Re: Hbase mapred classes question

Lars George Thu, 10 Jan 2008 06:28:22 -0800

Hi Stack,

First, if I have 40 servers with about 32 regions per server, whatwould I set the mapper and reducers to?
Coarsely, make as many maps as you have total regions (AssumingTableInputFormat is in the mix; it splits on table regions) and makethe number of reducers equal to the amount of index shards you wantout the other end. For example, you could have just one reducerproduce one index for all table content if table is small, etc.

But if we need to search it at the end while not producing one index,how would you handle this? Would you for example create ten indexes andthen use a MultiReader (?) to search across all 10? And this also meansobviously that I have to save those 10 indexes locally first to be ableto search it, means I need the storage room for them as a total anyways.What advantage does that have? Is there a maximum size (apart from whatthe OS implies on the filesystem) for Lucene indexes that would affect that?

And secondly, is it allowed to add new column values during theprocess? For example, if I read all rows and the column "contents:A"(for example row123.contents:A), analyze the data and then write outthe result in "row123.contents:B", is that OK to do?
You mean add new content while indexing? Yes. If you don't mind someof the added content ending up in the index...

I would add the same family but with a different label, and since thejob maps a different label, they would not be indexed, right?


Thanks again for your help Stack!

Best regards,
Lars

Re: Hbase mapred classes question

Reply via email to