Hi all,

I plan to write a mapreduce job that will use HBase as a source and
annotate each record (e.g. add a column to each record)

I think Stack said I might run in to issues doing this (region
splits?) but this was a while ago.
- is it correct I should take care with this?
- even if I write to a new column family?
- perhaps I should mapreduce to a temp file in HDFS and then a new job
to issue the updates?
  - or a custom output format?

I plan to do a lot of these operations on 200million records (and
anticipating growing to 500 million in coming months).

Thanks for any advice,

Tim

Reply via email to