[jira] Commented: (HADOOP-1439) Add endRow parameter to HClient#obtainScanner

James Kennedy (JIRA) Wed, 27 Jun 2007 18:11:48 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508701
 ]


James Kennedy commented on HADOOP-1439:
---------------------------------------

Right, so in the case of >, =, < type RowFilters you're quite right. More 
generally a RowFilter implementing those functions or otherwise may need to 
signal the scanner to stop altogether for whatever reason, even when the target 
rows are not located in a single consecutive chunk like >, =. <.  e.g. reached 
a maximum of nonconsecutive matched rows.

I'll implement this mechanism, clean up, and re-post the Hadoop-1531 patch when 
i get a chance.

That will make RowFilter more conducive to the EndRow filtering needed for this 
task. But as I said there will still be a little overhead vs. implementing an 
explicit endRow param to the scanner. 

> Add endRow parameter to HClient#obtainScanner
> ---------------------------------------------
>
>                 Key: HADOOP-1439
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1439
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>
> Currently the HClient#obtainScanner looks like this:
> {code}
> public synchronized HScannerInterface obtainScanner(Text[] columns, Text 
> startRow) throws IOException;
> {code}
> Add an overload that allows specification of endRow:
> {code}
> public synchronized HScannerInterface obtainScanner(Text[] columns, Text 
> startRow, Text endRow) throws IOException;
> {code}
> Use Case: Table contains the whole web.  Client just wants to scan google's 
> pages.  Currently, client could cut off the scanner as soon as the row key 
> leaves the google domain but cleaner if {{HScannerInterface#next()}} returns 
> false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1439) Add endRow parameter to HClient#obtainScanner

Reply via email to