Hi all, I plan to write a mapreduce job that will use HBase as a source and annotate each record (e.g. add a column to each record)
I think Stack said I might run in to issues doing this (region splits?) but this was a while ago. - is it correct I should take care with this? - even if I write to a new column family? - perhaps I should mapreduce to a temp file in HDFS and then a new job to issue the updates? - or a custom output format? I plan to do a lot of these operations on 200million records (and anticipating growing to 500 million in coming months). Thanks for any advice, Tim
