Re: delete rows test result

Keith Turner Mon, 16 Nov 2015 09:05:25 -0800

On Mon, Nov 16, 2015 at 10:35 AM, z11373 <[email protected]> wrote:

> Last week on separate thread I was suggested to use
> tableOperations.deleteRows for deleting rows that matched with specific
> ranges. So I was curious to try it out to see if it's better than my
> current
> implementation which is iterating all rows, and call putDelete for each.
> While researching, I also found Accumulo already provides BatchDeleter,
> which also does the same thing.
> I tried all of three, and below is my test results against three different
> tables (numbers are in milliseconds):
>
> Test 1 (using iterator and call putDelete for each):
> Table 1: 5,702
> Table 2: 6,912
> Table 3: 4,694
>
> Test 2 (using BatchDeleter class):
> Table 1: 8,089
> Table 2: 10,405
> Table 3: 7,818
>
> Test 3 (using tableOperations.deleteRows, note that I first iterate all
> rows, just to get the last row id, which then being passed as argument to
> the function):
> Table 1: 196,597
> Table 2: 226,496
> Table 3: 8,442
>
>
> I ran the tests few times, and pretty much got the consistent results
> above.
> I didn't look at the code what deleteRows really doing, but looking at my
> test results, I can say it sucks!
>


An advantage of deleteRows is that it can drop entire tablets that fall
completely within a range.   However the tablet at the end of the range may
need to be compacted in order to extend its range.  Using deleteRows for a
"small" range that falls completely within a table may be suboptimal.  Is
that your case?  How many key values are you deleting?  If its not the
compaction that causing the delay, then there may be a bug.

Not sure if it will help, but there is a utility function for finding a max
row.   It does a binary search within the key space.

http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#getMaxRow%28java.lang.String,%20org.apache.accumulo.core.security.Authorizations,%20org.apache.hadoop.io.Text,%20boolean,%20org.apache.hadoop.io.Text,%20boolean%29


> Note that for that test, I did scan and iterate just to get the last row
> id,
> but even I subtract the time for doing that, it's still way too slow.
> Therefore, I'd recommend anyone to avoid using deleteRows for this
> scenario.
> YMMV, but I'd stick with my original approach, which is doing the same like
> Test 1 above.
>
>
> Thanks,
> Z
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/delete-rows-test-result-tp15569.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Re: delete rows test result

Reply via email to