[
https://issues.apache.org/jira/browse/BLUR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000444#comment-15000444
]
Tim Williams commented on BLUR-445:
-----------------------------------
Can you clarify "move all index mutations to the bulk indexing approach"?
There are at least four different bulk-ish approaches (m/r, batch, hive,
enqueue) - so what does *the* bulk indexing approach mean in this context? I
reckon, you'll want to share more about the new daemon too... is that similar
to what supports hive-style indexing?
> Remove online mutates from the Blur thrift api
> ----------------------------------------------
>
> Key: BLUR-445
> URL: https://issues.apache.org/jira/browse/BLUR-445
> Project: Apache Blur
> Issue Type: Improvement
> Components: Blur
> Affects Versions: 0.3.0
> Reporter: Aaron McCurry
> Fix For: 0.3.0
>
>
> The primary use case for Blur is for massive ingestion of information to be
> indexed and searched. Currently I believe the system has been made overly
> complex due to the atomic operations in the online index mutation system. It
> forces the shard servers to have writers open to each of the indexes in the
> given table, this requires a lot of memory, cpu, and file resources per shard.
> Currently the system only allows for mutates to be atomic when mutating a
> single row. Batch mutates are not atomic.
> I propose that we move all index mutations to the bulk indexing approach and
> utilize hdfs snapshots for commiting index information within a given table.
> This will allow the controller and shard servers to become readonly with
> respect to the indexes.
> Assuming we move forward with this approach a new daemon will need to
> created, and index manager. This daemon will coordinate indexing (MR, Spark,
> Tez, Flink, etc) and merging globally for the cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)