On Wed, Nov 2, 2011 at 10:18 AM, Keith Massey <[email protected]> wrote: > On 11/1/11 9:53 PM, Keith Turner wrote: >> >> On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey >> <[email protected]> wrote: >>> >>> I'm not incredibly familiar with this code, but it could be a static >>> thread >>> pool right? And just let all ScannerIterators share some configurable >>> thread >>> pool? The thread would just be returned to the pool when the Reader >>> completed. >>> >> When I think of thread pools, I always think of setting an upper bound >> on the number of threads. It occurred to me that we could use a >> static thread pool if it were unbounded. This would replicate the >> current behavior and allow for thread reuse. So make the core size >> small (0,1 or 2), the max size MAX_INT, the timeout small (few >> seconds), and use a SynchronousQueue. Everything added to the pool >> should create a new thread if one is not available. Also make the >> threads daemon threads so they do not keep the process alive. > > I think that would actually be much better than replicating the current > behavior -- most of those threads seem to be very short-lived and we seem to > get into trouble because the garbage collector is not reclaiming them fast > enough (and I'm guessing we're bumping up against our ulimit). An unbounded > pool would probably stay relatively small in most cases. Having the option > of passing in a bounded thread pool would be nice though. If we have > hundreds of users querying accumulo at once we'll probably need some way to > bound the number of threads so we don't crash our server (although I guess > we could do that in our code that calls accumulo). >
Ok, I will create a ticket. One thing you could do w/ the current code is increase the batch size on the scanner. I think it is 1000 by default. After the scanner reads a few batches it starts kicking off the read ahead thread to read batches. Since a thread is created per batch increasing the batch size will decrease the frequency of thread creation by the scanner. You could try 2000, 4000, or 8000. Keith
