Jean-Daniel Cryans created HBASE-9428:
-----------------------------------------
Summary: Regex filters are at least an order of magnitude slower
since 0.94.3
Key: HBASE-9428
URL: https://issues.apache.org/jira/browse/HBASE-9428
Project: HBase
Issue Type: Bug
Reporter: Jean-Daniel Cryans
Fix For: 0.98.0, 0.94.12, 0.96.1
I found this issue after debugging a performance problem on an OpenTSDB
cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It
was caused by HBASE-7279 (ping [~lhofhansl]).
The easiest way to see it is to run a simple 1 client PE:
{noformat}
$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
{noformat}
Then in the shell do a filter scan (flush the table first and make sure if fits
in your blockcache if you want stable numbers).
Pre HBASE-7279:
{noformat}
hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=,
'regexstring:0000055872') )"}
ROW COLUMN+CELL
0000055872 column=info:data,
timestamp=1378248850191, value=(blanked)
1 row(s) in 1.2780 seconds
{noformat}
Post HBASE-7279
{noformat}
hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=,
'regexstring:0000055872') )"}
ROW COLUMN+CELL
0000055872 column=info:data,
timestamp=1378248850191, value=(blanked)
1 row(s) in 24.2940 seconds
{noformat}
I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow
like this.
It seems that since that jira went in we do a lot more row matching, and
running the regex gets super expensive.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira