How much data are we talking about? is it feasible to "shovel" new data to ES periodically? so changes are made to a data store and are only pushed to ES once or twice a day?
Personally I'd prefer having non-stable results instead of having to deal with this. The only place that you really want this is when doing data processing, and this is achievable using the scan/scroll search type -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/> On Tue, Apr 8, 2014 at 5:57 PM, David Causse <[email protected]> wrote: > Thanks for your replies. > > I totally agree with your "metrics document", but imagine this concrete > example : > I compute every hour a metric "Number of docs which contains the tag ES" > and I put the result inside a "metrics document" : 150 docs match. > I have screen which display the value of the metrics document and I allow > user to clic on this number to browse documents with a query like tag:ES > (plus a date range query), I expect to have the same number of docs than > the cached value in the "metrics document". > If documents are not updated there is no problem, stored metrics values > will always be consistent with query result on NRT. > But in my case documents can be manually updated/annotated, so query > result may not be equals to stored metrics values. In my case a user > perform a manual correction and remove the ES tag from 147 docs. > Now my "metrics document" which contains 150 is out of sync with NRT : the > user will clic on "150" and see only 3 docs. > > Lucene commit point seemed to me an elegant/efficient solution to add more > consistency to the application in that case. > > Regards > > Le mardi 8 avril 2014 14:34:16 UTC+2, Itamar Syn-Hershko a écrit : >> >> Well, Elasticsearch is built around the exact opposite requirement - of >> having the latest data always available as soon as possible. Exposing the >> Lucene commit points seems unpractical to me, also taking into account >> merge policies ES manages. >> >> What I would do is introduce a new document that aggregates those metrics >> and have a job that updates this document every now and then. You will use >> Elasticsearch both as a document store (for the metrics documents) and as >> the data-chewing piece of software. That metrics doc will be your snapshot >> of the data that you just pull and display - and you get caching all the >> way. >> >> Unless we are talking about huge volumes of metrics, this would be my >> route. This is a common practice in event-sourcing scenarios BTW. >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Author of RavenDB in Action <http://manning.com/synhershko/> >> >> >> On Tue, Apr 8, 2014 at 3:25 PM, David Causse <[email protected]> wrote: >> >>> >>> Le mardi 8 avril 2014 12:20:31 UTC+2, Itamar Syn-Hershko a écrit : >>> >>>> What do you mean by "stable"? and why would you want to refresh your >>>> reader only once a day? >>>> >>> >>> By "stable" I mean that the same query must always return the same >>> results. >>> I want to refresh the reader only once a day/hour because (for example) >>> some metrics are computed every day/hour, user can clic on some metrics to >>> see what docs are behind. As data can be updated afterwards metrics will >>> become unconsistent with the NRT reader but will remain consistent with an >>> unrefreshed reader. >>> >>> >>>> It sounds like what you are looking for is some sort of a snaphotting >>>> mechanism? if so, maybe try to model your data where you have a document / >>>> type that has the data in its stable form and update it periodically based >>>> on your business logic? >>>> >>> >>> Snapshotting is exactly what I'm looking for. Modeling my query and or >>> data to simulate a snapshot mechanism can be quite complex compared to the >>> lucene IndexCommit point in time feature. >>> >>> >>>> >>>> Elasticsearch doesn't support what you describe going all the way to a >>>> specific commit, but the scan/scroll search type is pretty much what you >>>> describe: http://www.elasticsearch.org/guide/en/elasticsearc >>>> h/reference/current/search-request-search-type.html#scan >>>> >>> >>> Yes, scroll is the closest ES feature I found. >>> >>> >>>> I think having this implemented on the Lucene commit level is going to >>>> be tricky if not impossible due to the distributed nature of ES (every >>>> shard on every node is practically a different Lucene index) >>>> >>> >>> I was afraid of that... >>> >>> So a simple naive process like this : >>> >>> 1/ API to create a commit point : Send a broadcast commit message to all >>> nodes for one ES index. >>> 2/ Use the IndexWriter.commit(Map<String, String> commitInfo) to store >>> ES specific data (like a cluster wide commit point ID generated by ES). >>> 3/ Add a param to the query API to specify which commit point to use >>> 4/ Add some API to list/delete unused commit points >>> >>> is unpractical? >>> >>> point 2,3,4 looks OK to me, tricky part seems to be in point 1. >>> >>> Thank you. >>> >>> >>> On Tue, Apr 8, 2014 at 12:45 PM, David Causse <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I'm evaluating ES features by reading the doc. Here is the missing >>>> usecase I was not able to find in the documentation. >>>> >>>> I want to perform query in an index from 2 differents applications. >>>> >>>> One application needs NRT view of the index. And another needs a more >>>> stable view of the data (refreshed every day or hour, it depends on >>>> application needs). >>>> >>>> With raw Lucene it's quite easy to implement such feature : >>>> >>>> - Keep one IndexReader open for the stable view + NRT : drawback is >>>> that I loose my IndexReader if the application restarts >>>> - Use IndexCommit and IndexDeletionPolicy for the stable >>>> IndexReader, it supports app restart. >>>> >>>> Does ES supports these lucene features : keep a commit point, open a >>>> reader on that particular commit (and delete the index commit when it's no >>>> more needed)? >>>> >>>> As the base feature is part of Lucene API would it be hard to implement >>>> such feature into ES? (I suspect scroll api to already keep an opened >>>> IndexReader under the hood, isn't it possible to generalize it to the query >>>> API?) >>>> >>>> Thanks. >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%40goo >>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967% >>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/002c5056-ca09-4ff5-ad77-4a0228e2a066%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/002c5056-ca09-4ff5-ad77-4a0228e2a066%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt%3D7rvBGov-2h3dQ%3D-7TkJYEj-7%2BMP1pVN%2BrVe0TQFeCA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
