[
https://issues.apache.org/jira/browse/BLUR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000449#comment-15000449
]
Garrett Barton commented on BLUR-445:
-------------------------------------
Yes please add more info, I am actually working on a prototype dumping kafka
topics via the enqueue mutate path. It sounds like your looking to stand up
something outside of the shard server itself to co-ordinate ingestion of new
data and that seems cool. Apache Druid does something similar, maybe look at
what they have done?
> Remove online mutates from the Blur thrift api
> ----------------------------------------------
>
> Key: BLUR-445
> URL: https://issues.apache.org/jira/browse/BLUR-445
> Project: Apache Blur
> Issue Type: Improvement
> Components: Blur
> Affects Versions: 0.3.0
> Reporter: Aaron McCurry
> Fix For: 0.3.0
>
>
> The primary use case for Blur is for massive ingestion of information to be
> indexed and searched. Currently I believe the system has been made overly
> complex due to the atomic operations in the online index mutation system. It
> forces the shard servers to have writers open to each of the indexes in the
> given table, this requires a lot of memory, cpu, and file resources per shard.
> Currently the system only allows for mutates to be atomic when mutating a
> single row. Batch mutates are not atomic.
> I propose that we move all index mutations to the bulk indexing approach and
> utilize hdfs snapshots for commiting index information within a given table.
> This will allow the controller and shard servers to become readonly with
> respect to the indexes.
> Assuming we move forward with this approach a new daemon will need to
> created, and index manager. This daemon will coordinate indexing (MR, Spark,
> Tez, Flink, etc) and merging globally for the cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)