Re: Decoupling Data and indexing

Amish Asthana Wed, 12 Nov 2014 10:34:14 -0800

Thanks Jorg.

On Wednesday, November 12, 2014 12:23:06 AM UTC-8, Jörg Prante wrote:
>
> There is no current method to redirect indexing to a preparer index for 
> delayed indexing, while searching is still enabled.
>
> By using rivers, you can close the _river index, some rivers (not all) may 
> take this as an indicator to stop indexing unless the _river index is 
> reopened. I consider this as a workaround and not as a feature.
>
> From my understanding the most preferred method to implement delayed 
> indexing currently is to set up a durable message queue (like RabbitMQ and 
> logstash) for external document persistency. By stopping/starting and 
> reconfiguring the message queue, the data can be indexed wherever you like.
>
> If you like to see delayed indexing as a core feature in ES and not as a 
> plugin, then you should open an issue with the suggestion. To be honest I 
> assume this will be rejected in favor of a queue in front of ES, like 
> described in this blog post 
>
> http://dopey.io/logstash-rabbitmq-tuning.html
>
> Jörg
>
>
> On Tue, Nov 11, 2014 at 11:40 PM, Amish Asthana <[email protected] 
> <javascript:>> wrote:
>
>> Thanks Jorg, make sense.
>> Few  minor questions :
>> a) With the current ES architecture is this the best/recommended way?
>> b) Is there any project in roadmap to provide more support for it.
>>
>> regards and thanks
>> amish
>>
>> On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote:
>>>
>>> FAST stored the source data in distributed machines, only the control 
>>> API was not distributed (similar to ES HTTP curl requests, which also 
>>> connect to one host only).
>>>
>>> Of course you could index raw JSON to a preparer index with a single 
>>> field, _all disabled, and field set to "not indexed" so there is no Lucene 
>>> activity on it. This preparer index could also hold mappings in special 
>>> documents for the indexing runs.
>>>
>>> The data duplication factor depends on the complexity of the mapping(s), 
>>> and the characteristics of the data (dictionary size, analyzer / tokenizer 
>>> output, norms etc.) 
>>>
>>> A plugin would do no magic at all, it could bundle the calls that 
>>> otherwise a client would have to execute from remote, and adds some 
>>> convenience commands for managing the prepare stage (e.g. suspend/resume) 
>>> and showing the current state of indexing.
>>>
>>> If redundant data is a no-go, then the whole approach is 
>>> counterintuitive.
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <[email protected]> 
>>> wrote:
>>>
>>>> With existing Elastic Search I can think of an architecture like this.
>>>>
>>>> Index : indexForDataDump : No mapping(Is it possible?) or minimum 
>>>> mapping. Use only to dump data from external system. There is some primary 
>>>> key.
>>>>
>>>> There are different search indexes with different mapping : 
>>>> search-index1, search-index2 etc.
>>>> These indexes get populated from the indexForDataDump using technique 
>>>> mentioned here 
>>>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>
>>>> .
>>>> So this way I can drop the search index as desired and create new one 
>>>> with new mapping.
>>>> Any pros/cons or issue with this approach? There will be data 
>>>> duplication but  I am hoping its minimum. ( Any way to quantify it?)
>>>>
>>>> regards and thanks
>>>> amish
>>>>
>>>>
>>>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>>>>
>>>>> I am not aware of FAST but the idea looks promising.
>>>>> However it might not be that easy to just have plugin for ES, as the 
>>>>> data itself is distributed on different machines.
>>>>> So it will not be possible to have just one server with the data, as 
>>>>> it will become single point of failure.
>>>>> regards and thanks
>>>>> amish
>>>>>
>>>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>>>>
>>>>>> I know from the FAST Search engine ten years ago there was a 
>>>>>> two-phase commit for distributed search and indexing. One server could 
>>>>>> listen on the API and keep the (compressed) input stored, and all the 
>>>>>> other 
>>>>>> indexing servers were supplied by this input in another phase to create 
>>>>>> binary indexes, either automatically, or by manual operation, called 
>>>>>> "suspend/resume indexing API". 
>>>>>>
>>>>>> The advantage was that data could be received permanently via API 
>>>>>> while FAST indexing could be stopped temporarily in order to balance 
>>>>>> between indexing and search performance on limited hardware.
>>>>>>
>>>>>> Do you think of something like that also for Elasticsearch? This 
>>>>>> architecture is possible to implement by a plugin.
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>> Is there a way we can decouple data and associated mapping/indexing 
>>>>>>> in Elasticsearch itself.
>>>>>>> Basically store the raw data as source( json or some other format)  
>>>>>>> and various mapping/index can be used on top of that.
>>>>>>> I understand that one can use an outside database or file system, 
>>>>>>> but can it be natively achieved in ES itself.
>>>>>>>
>>>>>>> Basically we are trying to see how our ES instance will work when we 
>>>>>>> have to change mapping of existing and continuously incoming data 
>>>>>>> without 
>>>>>>> any downtime for the end user.
>>>>>>> We have an added wrinkle that our indexing has to be edit aware for 
>>>>>>> versioning purpose; unlike ES where each edit is a new record.
>>>>>>> regards and thanks
>>>>>>> amish
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-399
>>>>>>> 1-4568-9891-018baf79ebae%40googlegroups.com 
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e38cd140-83bf-48a6-a9f8-c1e693d0d3be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Decoupling Data and indexing

Reply via email to