Awesome Himanshu, I was also trying to test using CPs and see where the sweetspot is between number of threads to process in parallel, and overloading the servers since you potentially send a heavy resource bound task to already taxed servers and therefore taking a huge hit everywhere. I was thinking of running a YCSB in parallel with mainly reads and then compare the impact if I do a 1) linear, 2) MR based, and 3) CP based full table scan.
Lars On Fri, May 27, 2011 at 3:40 AM, Himanshu Vashishtha <[email protected]> wrote: > I did some experiments using coprocessors and compare the result with > vanilla scan, and in one case with mapreduce. I wrote up a blog about these > experiments as it was getting a bit difficult for me to explain it on mail, > without figures etc. Please refer to > http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html > > The result seems to suggest the coprocessor endpoints are a useful feature > when one need to access a larger number of rows (well I can't quantify it as > of now) and generating some sparse results. The main advantage is that the > processing is done in parallel (region level granularity) and it can be > extended to come up with a parallel scanner functionality. > Interestingly, the single result coprocessor endpoints (aka the existing > one) fails when I increased the table data. I tried to do a row count on a > 100m rows. I need to dig more into it, but have mentioned my initial > thoughts in the blog. > > I want to test them more rigorously and will really appreciate your feedback > on the experiments. I have been on it for a while now, therefore need new > pair of eyes to do some review. > > Thanks a lot for your time. > > Cheers, > Himanshu >
