So, yesterday, I implemented this change via a coprocessor which basically initiates a scan which is raw, keeps tracking of # of delete markers encountered and stops when a configured threshold is met. It instantiates its own ScanDeleteTracker to do the masking through delete markers. So raw scan, count delete markers/stop if too many encountered and mask them so to return sane stuff back to the client.
I guess until now it has been working reasonably. Also, with HBase 8809, version tracking etc. should also work with filters now. On Mon, Jul 1, 2013 at 3:58 AM, lars hofhansl <[email protected]> wrote: > That would be quite dramatic change, we cannot pass delete markers to the > existing filters without confusing them. > We could invent a new method (filterDeleteKV or filterDeleteMarker or > something) on filters along with a new "filter type" that implements that > method. > > > -- Lars > > > ----- Original Message ----- > From: Varun Sharma <[email protected]> > To: "[email protected]" <[email protected]>; [email protected] > Cc: > Sent: Sunday, June 30, 2013 1:56 PM > Subject: Re: Issues with delete markers > > Sorry, typo, i meant that for user scans, should we be passing delete > markers through.the filters as well ? > > Varun > > > On Sun, Jun 30, 2013 at 1:03 PM, Varun Sharma <[email protected]> wrote: > > > For user scans, i feel we should be passing delete markers through as > well. > > > > > > On Sun, Jun 30, 2013 at 12:35 PM, Varun Sharma <[email protected] > >wrote: > > > >> I tried this a little bit and it seems that filters are not called on > >> delete markers. For raw scans returning delete markers, does it make > sense > >> to do that ? > >> > >> Varun > >> > >> > >> On Sun, Jun 30, 2013 at 12:03 PM, Varun Sharma <[email protected] > >wrote: > >> > >>> Hi, > >>> > >>> We are having an issue with the way HBase does handling of deletes. We > >>> are looking to retrieve 300 columns in a row but the row has tens of > >>> thousands of delete markers in it before we span the 300 columns > something > >>> like this > >>> > >>> > >>> row DeleteCol1 Col1 DeleteCol2 Col2 ................... DeleteCol3 > Col3 > >>> > >>> And so on. Therefore, the issue here, being that to retrieve these 300 > >>> columns, we need to go through tens of thousands of deletes - > sometimes we > >>> get a spurt of these queries and that DDoSes a region server. We are > okay > >>> with saying, only return first 300 columns and stop once you > encounter, say > >>> 5K column delete markers or something. > >>> > >>> I wonder if such a construct is provided by HBase or do we need to > build > >>> something on top of the RAW scan and handle the delete masking there. > >>> > >>> Thanks > >>> Varun > >>> > >>> > >>> > >> > > > >
