How much data are we talking about? is it feasible to "shovel" new data to
ES periodically? so changes are made to a data store and are only pushed to
ES once or twice a day?

Personally I'd prefer having non-stable results instead of having to deal
with this. The only place that you really want this is when doing data
processing, and this is achievable using the scan/scroll search type

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Author of RavenDB in Action <http://manning.com/synhershko/>


On Tue, Apr 8, 2014 at 5:57 PM, David Causse <[email protected]> wrote:

> Thanks for your replies.
>
> I totally agree with your "metrics document", but imagine this concrete
> example :
> I compute every hour a metric "Number of docs which contains the tag ES"
> and I put the result inside a "metrics document" : 150 docs match.
> I have screen which display the value of the metrics document and I allow
> user to clic on this number to browse documents with a query like tag:ES
> (plus a date range query), I expect to have the same number of docs than
> the cached value in the "metrics document".
> If documents are not updated there is no problem, stored metrics values
> will always be consistent with query result on NRT.
> But in my case documents can be manually updated/annotated, so query
> result may not be equals to stored metrics values. In my case a user
> perform a manual correction and remove the ES tag from 147 docs.
> Now my "metrics document" which contains 150 is out of sync with NRT : the
> user will clic on "150" and see only 3 docs.
>
> Lucene commit point seemed to me an elegant/efficient solution to add more
> consistency to the application in that case.
>
> Regards
>
> Le mardi 8 avril 2014 14:34:16 UTC+2, Itamar Syn-Hershko a écrit :
>>
>> Well, Elasticsearch is built around the exact opposite requirement - of
>> having the latest data always available as soon as possible. Exposing the
>> Lucene commit points seems unpractical to me, also taking into account
>> merge policies ES manages.
>>
>> What I would do is introduce a new document that aggregates those metrics
>> and have a job that updates this document every now and then. You will use
>> Elasticsearch both as a document store (for the metrics documents) and as
>> the data-chewing piece of software. That metrics doc will be your snapshot
>> of the data that you just pull and display - and you get caching all the
>> way.
>>
>> Unless we are talking about huge volumes of metrics, this would be my
>> route. This is a common practice in event-sourcing scenarios BTW.
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>
>>
>> On Tue, Apr 8, 2014 at 3:25 PM, David Causse <[email protected]> wrote:
>>
>>>
>>> Le mardi 8 avril 2014 12:20:31 UTC+2, Itamar Syn-Hershko a écrit :
>>>
>>>> What do you mean by "stable"? and why would you want to refresh your
>>>> reader only once a day?
>>>>
>>>
>>> By "stable" I mean that the same query must always return the same
>>> results.
>>> I want to refresh the reader only once a day/hour because (for example)
>>> some metrics are computed every day/hour, user can clic on some metrics to
>>> see what docs are behind. As data can be updated afterwards metrics will
>>> become unconsistent with the NRT reader but will remain consistent with an
>>> unrefreshed reader.
>>>
>>>
>>>> It sounds like what you are looking for is some sort of a snaphotting
>>>> mechanism? if so, maybe try to model your data where you have a document /
>>>> type that has the data in its stable form and update it periodically based
>>>> on your business logic?
>>>>
>>>
>>> Snapshotting is exactly what I'm looking for. Modeling my query and or
>>> data to simulate a snapshot mechanism can be quite complex compared to the
>>> lucene IndexCommit point in time feature.
>>>
>>>
>>>>
>>>> Elasticsearch doesn't support what you describe going all the way to a
>>>> specific commit, but the scan/scroll search type is pretty much what you
>>>> describe: http://www.elasticsearch.org/guide/en/elasticsearc
>>>> h/reference/current/search-request-search-type.html#scan
>>>>
>>>
>>> Yes, scroll is the closest ES feature I found.
>>>
>>>
>>>> I think having this implemented on the Lucene commit level is going to
>>>> be tricky if not impossible due to the distributed nature of ES (every
>>>> shard on every node is practically a different Lucene index)
>>>>
>>>
>>> I was afraid of that...
>>>
>>> So a simple naive process like this :
>>>
>>> 1/ API to create a commit point : Send a broadcast commit message to all
>>> nodes for one ES index.
>>> 2/ Use the IndexWriter.commit(Map<String, String> commitInfo) to store
>>> ES specific data (like a cluster wide commit point ID generated by ES).
>>> 3/ Add a param to the query API to specify which commit point to use
>>> 4/ Add some API to list/delete unused commit points
>>>
>>> is unpractical?
>>>
>>> point 2,3,4 looks OK to me, tricky part seems to be in point 1.
>>>
>>> Thank you.
>>>
>>>
>>> On Tue, Apr 8, 2014 at 12:45 PM, David Causse <[email protected]> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I'm evaluating ES features by reading the doc. Here is the missing
>>>> usecase I was not able to find in the documentation.
>>>>
>>>> I want to perform query in an index from 2 differents applications.
>>>>
>>>> One application needs NRT view of the index. And another needs a more
>>>> stable view of the data (refreshed every day or hour, it depends on
>>>> application needs).
>>>>
>>>> With raw Lucene it's quite easy to implement such feature :
>>>>
>>>>    - Keep one IndexReader open for the stable view + NRT : drawback is
>>>>    that I loose my IndexReader if the application restarts
>>>>    - Use IndexCommit and IndexDeletionPolicy for the stable
>>>>    IndexReader, it supports app restart.
>>>>
>>>> Does ES supports these lucene features : keep a commit point, open a
>>>> reader on that particular commit (and delete the index commit when it's no
>>>> more needed)?
>>>>
>>>> As the base feature is part of Lucene API would it be hard to implement
>>>> such feature into ES? (I suspect scroll api to already keep an opened
>>>> IndexReader under the hood, isn't it possible to generalize it to the query
>>>> API?)
>>>>
>>>> Thanks.
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%40goo
>>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/002c5056-ca09-4ff5-ad77-4a0228e2a066%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/002c5056-ca09-4ff5-ad77-4a0228e2a066%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt%3D7rvBGov-2h3dQ%3D-7TkJYEj-7%2BMP1pVN%2BrVe0TQFeCA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to