[
https://issues.apache.org/jira/browse/HBASE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788854#comment-13788854
]
Hudson commented on HBASE-9428:
-------------------------------
FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #781 (See
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/781/])
HBASE-9711 Improve HBASE-9428 - avoid copying bytes for RegexFilter unless
necessary (larsh: rev 1530059)
*
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java
> Regex filters are at least an order of magnitude slower since 0.94.3
> --------------------------------------------------------------------
>
> Key: HBASE-9428
> URL: https://issues.apache.org/jira/browse/HBASE-9428
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: Lars Hofhansl
> Fix For: 0.98.0, 0.94.12, 0.96.0
>
> Attachments: 9428-0.94.txt, 9428-trunk.txt
>
>
> I found this issue after debugging a performance problem on an OpenTSDB
> cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It
> was caused by HBASE-7279 (ping [~lhofhansl]).
> The easiest way to see it is to run a simple 1 client PE:
> {noformat}
> $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
> {noformat}
> Then in the shell do a filter scan (flush the table first and make sure if
> fits in your blockcache if you want stable numbers).
> Pre HBASE-7279:
> {noformat}
> hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=,
> 'regexstring:0000055872') )"}
> ROW COLUMN+CELL
>
>
> 0000055872 column=info:data,
> timestamp=1378248850191, value=(blanked)
>
>
> 1 row(s) in 1.2780 seconds
> {noformat}
> Post HBASE-7279
> {noformat}
> hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=,
> 'regexstring:0000055872') )"}
> ROW COLUMN+CELL
>
>
> 0000055872 column=info:data,
> timestamp=1378248850191, value=(blanked)
>
>
> 1 row(s) in 24.2940 seconds
> {noformat}
> I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all
> slow like this.
> It seems that since that jira went in we do a lot more row matching, and
> running the regex gets super expensive.
--
This message was sent by Atlassian JIRA
(v6.1#6144)