Hit reply too soon. The new segments should be available for search, but these new segments are not created until the transaction log is flushed.
Even LinkedIn moved on from Zoie. The SNA group had many great projects, but none of them got any traction. -- Ivan On Tue, Jul 1, 2014 at 8:02 AM, Ivan Brusic <[email protected]> wrote: > GET requests use both the Lucene index and the transaction log to retrieve > documents. Search requests will use only Lucene since the inverted index is > not updated until the transaction log is flushed. I haven't paid too much > attention to the distributed aspects of the code in a while, but this > behavior was used prior to 1.0. > > Cheers, > > Ivan > > > On Mon, Jun 30, 2014 at 3:37 AM, Nico Krijnen <[email protected]> wrote: > >> > Zoie is not for distributed search. >> >> We know, that's why we replaced our search layer with Elastic Search. >> Zoie and Sensei do not have as much users as Elastic Search and as such >> have much less traction, which made Elastic Search an obvious choice for >> handling our distributed search needs. >> >> > You mention the in-memory segments for fast NRT. Lucene 4 has >> implemented this by default. >> >> Nice. I'm reading up on the details about this. Do you know if these >> in-memory segments are immediately being used for search? Or do the new >> docs only become available after the segments are flushed to disk? >> >> Last friday I also heard about some of the performance improvement being >> worked at for ElasticSearch 1.3 and 1.4, sounds like steps are already >> being taken to improve realtime search. >> >> Nico >> >> >> On Thursday, June 26, 2014 1:20:10 PM UTC+2, Jörg Prante wrote: >> >>> Zoie is not for distributed search. If you want to analyze the LinkedIn >>> developments for this area with Lucene, you should look at Sensei >>> >>> There was also a BalancedSegmentMergePolicy donated to Lucene 2.x from >>> the Zoie project >>> >>> https://issues.apache.org/jira/browse/LUCENE-1924 >>> >>> but there was not enough energy for maintaining it. Now Lucene is at >>> version 4, with vast improvements in the area of segment merging. >>> >>> You mention the in-memory segments for fast NRT. Lucene 4 has >>> implemented this by default, plus Elasticsearch has some more improvements >>> for distributed NRT get. >>> >>> Note, not all searches can be candidates for NRT. If you use mlockall >>> and index store type mmapfs, you can move almost all your ES/Lucene data >>> and files to RAM (if you can spend enough hardware). Modifying data in the >>> index always means to invalidate fielddata cache and maybe filter/facet >>> caches, and creation of new cache generations, which is expensive and >>> destroys performance. There is a tradeoff, balancing must be done very >>> carefully to avoid stale results. This is hard when not much is known about >>> the typical search workload of an application. ES allows to cache filters >>> and to clear caches explicitly. Maybe this is an area to experiment with. >>> But it always depends. >>> >>> Jörg >>> >>> >>> On Thu, Jun 26, 2014 at 11:25 AM, Nico Krijnen <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> We have recently migrated our application from 'bare Lucene + Zoie for >>>> realtime search' to Elastic Search. Elastic search is awesome and next to >>>> scalability, it gives us lots of additional features. The one thing we >>>> really miss though is realtime search. >>>> >>>> Search is the core of our application. All our data is stored in the >>>> index (primary data store). When a user adds a file or makes a change, >>>> their subsequent search must reflect that change. With Zoie, the data was >>>> indexed very quickly into a temporary Lucene memory index. Not having to >>>> write+read it on disk makes the documents available for search much faster >>>> than NRT Lucene. The memory index is flushed to disk asynchrounously from >>>> time to time, not impacting indexing or search performance. Zoie also >>>> allows you to wait for a specific 'version of the index' to be available >>>> for searching. That way we could make the user's thread wait until their >>>> data was indexed in memory, only pausing the thread of that user without >>>> having any performance impact for all the other users. >>>> >>>> Result: realtime search and insanely fast indexing. >>>> >>>> With Elastic Search we have to do a refresh to make data available for >>>> search. Lots of refreshes or the 1 second refresh interval will cause >>>> significant slower indexing speed. We don't know beforehand when our users >>>> will import documents or make lots of changes, so we cannot really increase >>>> the refresh interval when needed to make indexing faster. We know that >>>> 'get' is realtime and we make use of that as much as possible, but in lots >>>> of cases we really require a search to find the data. >>>> >>>> Our plan is to implement some mechanism in Elastic Search to get the >>>> same realtime search + fast indexing behavior that we had with Zoie. We >>>> need some pointers though on what would be the best place in Elastic Search >>>> to do something like this. After all it hooks into low level Elastic Search >>>> and Lucene stuff. >>>> >>>> I can imagine that 'realtime-search while indexing' is important for >>>> many other Elastic Search users too. What are the chances of something like >>>> this getting merged back into the main branch? >>>> >>>> I'm planning to be at the Friday drinks tomorrow in Amsterdam. Is there >>>> anyone attending with whom I could do some sparring with on this matter? >>>> >>>> Thanks, >>>> Nico >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/0ed50d5f-4ade-4d56-af06-6e2c26feff9b% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/0ed50d5f-4ade-4d56-af06-6e2c26feff9b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/0e4af17f-4dd0-4355-8453-81b4c09777c3%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/0e4af17f-4dd0-4355-8453-81b4c09777c3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA%3DPqK7pnhxx5-LLv_2ti2xwUWBg-5x5dcbBJcLUTn7cw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
