[jira] Commented: (HBASE-3247) Changes API: API for pulling edits from HBase

Evert Arckens (JIRA) Thu, 25 Nov 2010 02:37:42 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935712#action_12935712
 ]


Evert Arckens commented on HBASE-3247:
--------------------------------------

@stack
With the Rowlog you can register a subscription and then all messages that are 
put on the rowlog will be kept for that subscription. If you then also register 
a listener (cfr RowLogMessageListener) on that subscription, the rowlog 
processor will start feeding the messages to the listener.
If you can make a bulk load that only processes data that was changed before a 
certain point in time, you can let that run and in the meanwhile let the rowlog 
record all changes that are done after that point.

Looking a bit further at how the Indexer in Lily uses the rowlog 
(http://docs.outerthought.org/lily-docs-current/415-lily.html) :
When the indexer recieves a message it will use the record's current data and 
put that data in the index (IndexUpdater is the listener that is registered on 
the rowlog).
An index rebuild will use map reduce to go over all the data again and update 
the index.
It is allowed for both the bulk index rebuild and the index updater through the 
rowlog to run in parallel. Both will look at the current data of the record and 
put that in the index. So there is no need for a transition point from bulk to 
incremental.
The indexer is written specifically to put Lily records into a Solr index. It 
is not designed yet to plug-in another index. But it should be do-able to use 
this same framework to have something non-Lily on the one hand and a non-Solr 
index on the other. If we look at the classes in the framework : the 
IndexUpdater is the implementation of the RowLogMessageListener which has 
knowledge about lily-records and decides 'what' to index. The Indexer class is 
responsible for mapping the Lily-schema onto the Solr-schema and maintains the 
communication with Solr.

> Changes API: API for pulling edits from HBase
> ---------------------------------------------
>
>                 Key: HBASE-3247
>                 URL: https://issues.apache.org/jira/browse/HBASE-3247
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>
> Talking to Shay from Elastic Search, he was asking where the Changes API is 
> in HBase.  Talking more -- there was a bit of beer involved so apologize up 
> front -- he wants to be able to bootstrap an index and thereafter ask HBase 
> for changes since time t.  We thought he could tie into the replication 
> stream, but rather he wants to be able to pull rather than have it pushed to 
> him (in case he crashes, etc. so on recovery he can start pulling again from 
> last good edit received).  He could do the bootstrap with a Scan.  
> Thereafter, requests to pull from hbase would pass a marker of some  sort.  
> HBase would then give out edits that came in after this marker, in batches, 
> along with an updated marker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3247) Changes API: API for pulling edits from HBase

Reply via email to