[jira] [Commented] (HBASE-9428) Regex filters are at least an order of magnitude slower since 0.94.3

Lars Hofhansl (JIRA) Tue, 03 Sep 2013 22:50:05 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757482#comment-13757482
 ]


Lars Hofhansl commented on HBASE-9428:
--------------------------------------

Looking at HBASE-7279, I can't see how it can have this effect specifically to 
the RegexStringComparator. All that it does passing a byte[], an offset, and a 
length to the compareTo (rather than an byte[], *0*, and a length).

So you built hbase once at r1417559 and once at r1417716 to see the difference? 
Or did you revert HBASE-7279 from the current 0.94 tip?
                
> Regex filters are at least an order of magnitude slower since 0.94.3
> --------------------------------------------------------------------
>
>                 Key: HBASE-9428
>                 URL: https://issues.apache.org/jira/browse/HBASE-9428
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.98.0, 0.94.12, 0.96.1
>
>
> I found this issue after debugging a performance problem on an OpenTSDB 
> cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It 
> was caused by HBASE-7279 (ping [~lhofhansl]).
> The easiest way to see it is to run a simple 1 client PE:
> {noformat}
> $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
> {noformat}
> Then in the shell do a filter scan (flush the table first and make sure if 
> fits in your blockcache if you want stable numbers).
> Pre HBASE-7279:
> {noformat}
> hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 
> 'regexstring:0000055872') )"}
> ROW                                                 COLUMN+CELL               
>                                                                               
>                                             
>  0000055872                                         column=info:data, 
> timestamp=1378248850191, value=(blanked)                                      
>                                                                               
>                 
> 1 row(s) in 1.2780 seconds
> {noformat}
> Post HBASE-7279
> {noformat}
> hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 
> 'regexstring:0000055872') )"}
> ROW                                                 COLUMN+CELL               
>                                                                               
>                                             
>  0000055872                                         column=info:data, 
> timestamp=1378248850191, value=(blanked)                                      
>                                                                               
>                   
> 1 row(s) in 24.2940 seconds
> {noformat}
> I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all 
> slow like this.
> It seems that since that jira went in we do a lot more row matching, and 
> running the regex gets super expensive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9428) Regex filters are at least an order of magnitude slower since 0.94.3

Reply via email to