[
https://issues.apache.org/jira/browse/HBASE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935712#action_12935712
]
Evert Arckens commented on HBASE-3247:
--------------------------------------
@stack
With the Rowlog you can register a subscription and then all messages that are
put on the rowlog will be kept for that subscription. If you then also register
a listener (cfr RowLogMessageListener) on that subscription, the rowlog
processor will start feeding the messages to the listener.
If you can make a bulk load that only processes data that was changed before a
certain point in time, you can let that run and in the meanwhile let the rowlog
record all changes that are done after that point.
Looking a bit further at how the Indexer in Lily uses the rowlog
(http://docs.outerthought.org/lily-docs-current/415-lily.html) :
When the indexer recieves a message it will use the record's current data and
put that data in the index (IndexUpdater is the listener that is registered on
the rowlog).
An index rebuild will use map reduce to go over all the data again and update
the index.
It is allowed for both the bulk index rebuild and the index updater through the
rowlog to run in parallel. Both will look at the current data of the record and
put that in the index. So there is no need for a transition point from bulk to
incremental.
The indexer is written specifically to put Lily records into a Solr index. It
is not designed yet to plug-in another index. But it should be do-able to use
this same framework to have something non-Lily on the one hand and a non-Solr
index on the other. If we look at the classes in the framework : the
IndexUpdater is the implementation of the RowLogMessageListener which has
knowledge about lily-records and decides 'what' to index. The Indexer class is
responsible for mapping the Lily-schema onto the Solr-schema and maintains the
communication with Solr.
> Changes API: API for pulling edits from HBase
> ---------------------------------------------
>
> Key: HBASE-3247
> URL: https://issues.apache.org/jira/browse/HBASE-3247
> Project: HBase
> Issue Type: Task
> Reporter: stack
>
> Talking to Shay from Elastic Search, he was asking where the Changes API is
> in HBase. Talking more -- there was a bit of beer involved so apologize up
> front -- he wants to be able to bootstrap an index and thereafter ask HBase
> for changes since time t. We thought he could tie into the replication
> stream, but rather he wants to be able to pull rather than have it pushed to
> him (in case he crashes, etc. so on recovery he can start pulling again from
> last good edit received). He could do the bootstrap with a Scan.
> Thereafter, requests to pull from hbase would pass a marker of some sort.
> HBase would then give out edits that came in after this marker, in batches,
> along with an updated marker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.