[ 
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973611#comment-16973611
 ] 

Lars Hofhansl commented on PHOENIX-5494:
----------------------------------------

Thanks [~kozdemir]. Looking at the patch... This will require a full scan over 
the table/region - RowFilter does not do any SEEK'ing. So we'd be scanning 
every row, and comparing each row-key in the filter.
I'll try, but I would guess that it would actually make slower (specifically 
for small batches).

But... Since you factored that logic out (which is pretty cool, BTW!), we could 
use HBase's FuzzyRowFilter or MultiRowRangeFilter (or a simplified Filter that 
does the necessary SEEK'ing). I was thinking we could use Phoenix's 
SkipScanFilter for this.


> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5494
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5494
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>              Labels: performance
>         Attachments: PHOENIX-5494.master.001.patch, 
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png, 
> Screenshot_20191110_161453.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes 
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The 
> cost is mostly incured by the repeated setup (and seeking) of the new region 
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels 
> inclined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to