Is it possiblte to build a "bucket" or "container" system that has x amount size and scales to the next bucket once that size has been reached?
The issue i have is a db with 235 million pages takes FOREVER to do anything on simply because it makes a duplicate of itself for all processes. Would it make sense to create buckets, much like extents (but larger) in rdbms world and work on smaller volumes of data at a time? I guess i vision a nutch db daemon that processes all "bucket requests" and queues them up appropriately while it processes the workfload it can. I vision an erra when i add in my daily addurl requests and it not take 4 days to process them :)