Sounds like you and others are already ahead of me. Thanks for opening HBASE-13441 and your related work. Some responses below:
> Nice idea! I agree that the Scan API would be cleaned up by your > suggestions, especially the doc updates. Some comments below: > > > Scan.bufferSize (instead of maxResultSize for the target over-the-wire > > size - though this is still confusing because it's common to go over this > > size) > Ya this setting will always have a little ambiguity associated with it (at > least until such a time where we are able to enforce it at the byte level > i.e. send back partial cells). Scan.bufferSize sounds okay. As a note, > there was some discussion in HBASE-11544 about renaming this field and one > of the recommendations was Scan.rpcChunkSize. rpcChunkSize sounds fine to me too - much better than maxResultSize > > Scan.limitRows (instead of caching - along with true client side support) > Makes sense. I think that client side support is actually already there (at > least it is in ClientScanner via the countdown variable that is used as the > caching value for new scanner callables). Gotcha - but I would envision the client actually closing the scanner (Iterable<Result>) once the row limit is hit. Changing the meaning from something about how the data transfer is implemented to an actual visible query limit. > > > Scan.allowPartialResults (to indicate it's ok to break up rows across > Results...) > With HBASE-11544 in branch-1+ the server will stop adding Cells as soon as > the buffer fills and send back the accumulated Results to the client (last > Result may be a partial of its row). In the case that allow partial results > is false, the ClientScanner handles reassembling the partials into a > complete view of the row before releasing the Result to the application. That's awesome. Great work. > With this proposed cleanup, are you recommending that we do away with > Scan.setBatch? Would the default configuration remain as it is now in > branch-1+ (rowLimit = Integer.MAX_VALUE, bufferSize = 2MB, > allowPartialResults = false)? Yes, I was thinking of dropping setBatch also.
