[
https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509893
]
James Kennedy commented on HADOOP-1531:
---------------------------------------
Thanks for the update.
filterAllRemaining() always returns false in the RegExpRowFilter case because
there is nothing about filtering a single row that tells the filter it should
no longer process future rows. After the first time the filter method returns
true (say row 50 filtered), rows 57-78, may be good matches and should be
included in the results. This filter does not assume that valid rows are in a
consecutive chunk only.
As for the filter(final Text rowKey, final Text colKey, final byte[] data)
method, the logic is already AND logic. If rowKey is non-null and does not
match regexp, method returns true right away. Otherwise returns true if colum
tests fail. The javadoc is confusing on this though by saying and/or and I
should probably change it. What I meant was that it's an AND if you include a
non-null rowKey. But non-null row key is optional and if not included, then
only the column tests apply.
Any word on my Q above about exporting your eclipse formatter settings?
> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
> Key: HADOOP-1531
> URL: https://issues.apache.org/jira/browse/HADOOP-1531
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Affects Versions: 0.14.0
> Reporter: James Kennedy
> Assignee: James Kennedy
> Attachments: RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation. This
> is passed to the HRegion.HScanner via HClient.openScanner() though it is an
> entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it
> encounters a row that is not filtered by the RowFilter. The filter applies
> criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent
> nulls as absent columns or as DELETED_BYTES. Nevertheless null cases are
> taken care of by the filter and you can for example retrieve all rows where
> column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2.
> This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no
> disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if
> rowkey.matches(A) and col1==B" although the interface is created to allow
> for that.
> But it should be easy to write an improved RowFilterInterface implementation
> to take care of most of the above without having to change code elsewhere.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.