Aaron, I am really grateful for such a complete answer. As an aside, is there a book or a document where this kind of reference is collected? Surely I will have my own notes.
For my future purpose - to give the user the latest updates that have made it into the index yet - it seems that way 2. is the closest. Do I understand correctly that Blur will keep indexing for 5 seconds (configurable), while the user, who searches against the index, will not see the new results? However, there is a queue in front of the index that one can query separately? Again, thank you. Best regards, Mark On Mon, Dec 1, 2014 at 8:10 AM, Aaron McCurry <[email protected]> wrote: > On Sun, Nov 30, 2014 at 3:53 PM, Mark Kerzner <[email protected]> > wrote: > > > Hi, > > > > Latest Lucene 4.0 (and Solr) has the feature of near-real-time search: > > index is updated in memory and is available for searches, but not > committed > > to the hard drive, with all the accompanying features. > > > > Blur has the same, I believe, but I am guessing that it has implemented > it > > directly, without the latest Lucene in-memory features. Why do I think > so? > > Because Blur had this seemingly before Lucene 4.0. > > > > Could you please either give me the answer, or tell me where in the code > to > > look? > > > > Yes Blur has a NRT like capability though it is not implemented with the > Lucene NRT classes. Currently there are 3 different ways that Blur accepts > data mutates. > > 1. Thrift API mutate call. This call is blocking and commits and refreshes > the index during the call. This is also an atomic call. > http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutate > A variant of the call is mutate batch which just batches the calls to each > shard server. However this is not an atomic call. Meaning that in the > event of a mutate failure in one shard the entire batch will not fail. > http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutateBatch > > 2. Thrift API enqueue mutate call. This call is similar to the Lucene NRT > updates in that it will indexing for 5 seconds (configurable) and then > commit and refresh. Something to note about this method that is different > than the default Lucene implementation is that Blur will not return results > to the user that are not committed to the index. The way this call is > implemented is by placing an in-memory queue in front of the indexing > process. Currently the queue is not backed to disk, but it is something we > want to add. > http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_enqueueMutate > > 3. The last method is not NRT but is worth mentioning. MapReduce batch > processing can produce a bulk incremental load for Blur. > > All of the index changes are performed per shard through a single internal > API. > > > https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/IndexAction.java > > And the writer that handles all mutates. > > > https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/BlurIndexSimpleWriter.java > > There will also be a 4th method for index mutations soon. We will be > implementing a write API in our new command platform. In concept they are > similar to stored procedures which allow developers to embed their own > methods, indexing and query models into Blur. > > Does this answer your question? > > Aaron > > > > Thank you. > > > > Sincerely, > > Mark > > >
