Re: Multiple views (IndexReader/IndexCommit) of one index in ES

David Causse Tue, 08 Apr 2014 07:58:27 -0700

Thanks for your replies.

I totally agree with your "metrics document", but imagine this concrete 
example :
I compute every hour a metric "Number of docs which contains the tag ES" 
and I put the result inside a "metrics document" : 150 docs match.
I have screen which display the value of the metrics document and I allow 
user to clic on this number to browse documents with a query like tag:ES 
(plus a date range query), I expect to have the same number of docs than 
the cached value in the "metrics document".
If documents are not updated there is no problem, stored metrics values 
will always be consistent with query result on NRT.
But in my case documents can be manually updated/annotated, so query result 
may not be equals to stored metrics values. In my case a user perform a 
manual correction and remove the ES tag from 147 docs.
Now my "metrics document" which contains 150 is out of sync with NRT : the 
user will clic on "150" and see only 3 docs.


Lucene commit point seemed to me an elegant/efficient solution to add more 
consistency to the application in that case.

Regards

Le mardi 8 avril 2014 14:34:16 UTC+2, Itamar Syn-Hershko a écrit :
>
> Well, Elasticsearch is built around the exact opposite requirement - of 
> having the latest data always available as soon as possible. Exposing the 
> Lucene commit points seems unpractical to me, also taking into account 
> merge policies ES manages.
>
> What I would do is introduce a new document that aggregates those metrics 
> and have a job that updates this document every now and then. You will use 
> Elasticsearch both as a document store (for the metrics documents) and as 
> the data-chewing piece of software. That metrics doc will be your snapshot 
> of the data that you just pull and display - and you get caching all the 
> way.
>
> Unless we are talking about huge volumes of metrics, this would be my 
> route. This is a common practice in event-sourcing scenarios BTW.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Tue, Apr 8, 2014 at 3:25 PM, David Causse <[email protected]<javascript:>
> > wrote:
>
>>
>> Le mardi 8 avril 2014 12:20:31 UTC+2, Itamar Syn-Hershko a écrit :
>>
>>> What do you mean by "stable"? and why would you want to refresh your 
>>> reader only once a day?
>>>
>>
>> By "stable" I mean that the same query must always return the same 
>> results.
>> I want to refresh the reader only once a day/hour because (for example) 
>> some metrics are computed every day/hour, user can clic on some metrics to 
>> see what docs are behind. As data can be updated afterwards metrics will 
>> become unconsistent with the NRT reader but will remain consistent with an 
>> unrefreshed reader.
>>
>>
>>> It sounds like what you are looking for is some sort of a snaphotting 
>>> mechanism? if so, maybe try to model your data where you have a document / 
>>> type that has the data in its stable form and update it periodically based 
>>> on your business logic?
>>>
>>
>> Snapshotting is exactly what I'm looking for. Modeling my query and or 
>> data to simulate a snapshot mechanism can be quite complex compared to the 
>> lucene IndexCommit point in time feature.
>>  
>>
>>>
>>> Elasticsearch doesn't support what you describe going all the way to a 
>>> specific commit, but the scan/scroll search type is pretty much what you 
>>> describe: http://www.elasticsearch.org/guide/en/elasticsearch/reference/
>>> current/search-request-search-type.html#scan
>>>
>>
>> Yes, scroll is the closest ES feature I found.
>>
>>  
>>> I think having this implemented on the Lucene commit level is going to 
>>> be tricky if not impossible due to the distributed nature of ES (every 
>>> shard on every node is practically a different Lucene index)
>>>
>>
>> I was afraid of that...
>>
>> So a simple naive process like this : 
>>
>> 1/ API to create a commit point : Send a broadcast commit message to all 
>> nodes for one ES index.
>> 2/ Use the IndexWriter.commit(Map<String, String> commitInfo) to store ES 
>> specific data (like a cluster wide commit point ID generated by ES).
>> 3/ Add a param to the query API to specify which commit point to use
>> 4/ Add some API to list/delete unused commit points
>>
>> is unpractical?
>>
>> point 2,3,4 looks OK to me, tricky part seems to be in point 1.
>>
>> Thank you.
>>
>>
>> On Tue, Apr 8, 2014 at 12:45 PM, David Causse <[email protected]> wrote:
>>
>>>  Hi,
>>>
>>> I'm evaluating ES features by reading the doc. Here is the missing 
>>> usecase I was not able to find in the documentation.
>>>
>>> I want to perform query in an index from 2 differents applications.
>>>
>>> One application needs NRT view of the index. And another needs a more 
>>> stable view of the data (refreshed every day or hour, it depends on 
>>> application needs).
>>>
>>> With raw Lucene it's quite easy to implement such feature :
>>>
>>>    - Keep one IndexReader open for the stable view + NRT : drawback is 
>>>    that I loose my IndexReader if the application restarts
>>>    - Use IndexCommit and IndexDeletionPolicy for the stable 
>>>    IndexReader, it supports app restart. 
>>>
>>> Does ES supports these lucene features : keep a commit point, open a 
>>> reader on that particular commit (and delete the index commit when it's no 
>>> more needed)?
>>>
>>> As the base feature is part of Lucene API would it be hard to implement 
>>> such feature into ES? (I suspect scroll api to already keep an opened 
>>> IndexReader under the hood, isn't it possible to generalize it to the query 
>>> API?)
>>>
>>> Thanks.
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/4b082651-51e6-499c-8882-44398c857dc8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/de28be9d-1920-49cd-a089-234c30b60967%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/002c5056-ca09-4ff5-ad77-4a0228e2a066%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Multiple views (IndexReader/IndexCommit) of one index in ES

Reply via email to