To add to what Jieshan said: On Fri, Nov 21, 2014 at 8:32 PM, Cliff <[email protected]> wrote:
> 1. > Why does "HBase replication" need replicationSink? > I think replicationSource can do replicationSink's work as well. > And if we don't use replicationSink, we just need one time I/O. > If you were to use HTable from the source: - All your meta lookups would be a lot slower than if you were local. We rely on this to be extremely fast. - You would be sending at least as many RPCs, but probably more since you'll be sending them directly to each region server on the slave side, chunked up by table. More, tinier RPCs probably isn't what you want over WAN. - BTW sending one big batch can also make RPC compression more efficient. - Retries would be done over the WAN. For example, you're regularly sending 2MB batches to a region and then it moves. The first batch that gets sent after the move will go to where you think the region is, only to get a NSRE. You'll then do a meta lookup to find the new location, again over the WAN, and send those 2MBs again to the new location. It's a lot of back and forth you'd rather do in a LAN. Hope this helps, J-D
