[
https://issues.apache.org/jira/browse/HADOOP-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508701
]
James Kennedy commented on HADOOP-1439:
---------------------------------------
Right, so in the case of >, =, < type RowFilters you're quite right. More
generally a RowFilter implementing those functions or otherwise may need to
signal the scanner to stop altogether for whatever reason, even when the target
rows are not located in a single consecutive chunk like >, =. <. e.g. reached
a maximum of nonconsecutive matched rows.
I'll implement this mechanism, clean up, and re-post the Hadoop-1531 patch when
i get a chance.
That will make RowFilter more conducive to the EndRow filtering needed for this
task. But as I said there will still be a little overhead vs. implementing an
explicit endRow param to the scanner.
> Add endRow parameter to HClient#obtainScanner
> ---------------------------------------------
>
> Key: HADOOP-1439
> URL: https://issues.apache.org/jira/browse/HADOOP-1439
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
>
> Currently the HClient#obtainScanner looks like this:
> {code}
> public synchronized HScannerInterface obtainScanner(Text[] columns, Text
> startRow) throws IOException;
> {code}
> Add an overload that allows specification of endRow:
> {code}
> public synchronized HScannerInterface obtainScanner(Text[] columns, Text
> startRow, Text endRow) throws IOException;
> {code}
> Use Case: Table contains the whole web. Client just wants to scan google's
> pages. Currently, client could cut off the scanner as soon as the row key
> leaves the google domain but cleaner if {{HScannerInterface#next()}} returns
> false
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.