bulk load architecture

Adam Fuchs Mon, 15 Aug 2016 10:49:31 -0700

I've been looking through the bulk load code lately related to some
performance issues a customer of ours is experiencing, and I'm perplexed by
a couple of things. Between o.a.a.master.tableOps.LoadFiles and
o.a.a.server.client.BulkImporter we have 4 thread pools that are used in
bulk load. It seems like only the master thread pool gets any parallelism
because we always send one file at a time to the tservers (LoadFiles:154).
Are the three thread pools in the tserver vestigial? Did we used to send
bigger batches to the tservers and find that one at a time was more optimal?


Seems like we could greatly simplify the tserver portion of the bulk load.
Can anybody think of why that might not be a good idea?

Also, has anybody optimized the pool sizes for multiple concurrent large
bulk loads, and do you have suggestions on what settings to use (i.e.
master.fate.threadpool.size and master.bulk.threadpool.size)?

Thanks,
Adam

bulk load architecture

Reply via email to