Hi everybody, we have the following scenario: our clustered web application needs to write records to hbase, we need to support a very high throughput, we expect up to 10-30 thousends requests per second and may be even more
so usually it is not a problem for HBase, if we use a "random" row key; in this case the data is distributed between all region servers equally but, we need to generate our keys based on the current time, so we are able to run MR jobs for a period of time without processing the whole data, using scan.setStartRow(stopRow); scan.setStopRow(startRow); in our case the generated row keys look similar and are there for going to the same region server... so this approach is not really using the power of the whole cluster, but only one server, which can be dangerous in case of a very high load so, we are thinking about writing the records first to a HDFS file, and run additionally a MR job periodically to read the finnished HDFS files and insert the records to HBase what do you guys think about it? any suggestions would be very appreciated regards andre