Re: Decoupling Data and indexing

Amish Asthana Tue, 11 Nov 2014 14:40:37 -0800

Thanks Jorg, make sense.
Few  minor questions :
a) With the current ES architecture is this the best/recommended way?
b) Is there any project in roadmap to provide more support for it.


regards and thanks
amish

On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote:
>
> FAST stored the source data in distributed machines, only the control API 
> was not distributed (similar to ES HTTP curl requests, which also connect 
> to one host only).
>
> Of course you could index raw JSON to a preparer index with a single 
> field, _all disabled, and field set to "not indexed" so there is no Lucene 
> activity on it. This preparer index could also hold mappings in special 
> documents for the indexing runs.
>
> The data duplication factor depends on the complexity of the mapping(s), 
> and the characteristics of the data (dictionary size, analyzer / tokenizer 
> output, norms etc.) 
>
> A plugin would do no magic at all, it could bundle the calls that 
> otherwise a client would have to execute from remote, and adds some 
> convenience commands for managing the prepare stage (e.g. suspend/resume) 
> and showing the current state of indexing.
>
> If redundant data is a no-go, then the whole approach is counterintuitive.
>
> Jörg
>
>
> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <[email protected] 
> <javascript:>> wrote:
>
>> With existing Elastic Search I can think of an architecture like this.
>>
>> Index : indexForDataDump : No mapping(Is it possible?) or minimum 
>> mapping. Use only to dump data from external system. There is some primary 
>> key.
>>
>> There are different search indexes with different mapping : 
>> search-index1, search-index2 etc.
>> These indexes get populated from the indexForDataDump using technique 
>> mentioned here 
>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>.
>> So this way I can drop the search index as desired and create new one 
>> with new mapping.
>> Any pros/cons or issue with this approach? There will be data duplication 
>> but  I am hoping its minimum. ( Any way to quantify it?)
>>
>> regards and thanks
>> amish
>>
>>
>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>>
>>> I am not aware of FAST but the idea looks promising.
>>> However it might not be that easy to just have plugin for ES, as the 
>>> data itself is distributed on different machines.
>>> So it will not be possible to have just one server with the data, as it 
>>> will become single point of failure.
>>> regards and thanks
>>> amish
>>>
>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>>
>>>> I know from the FAST Search engine ten years ago there was a two-phase 
>>>> commit for distributed search and indexing. One server could listen on the 
>>>> API and keep the (compressed) input stored, and all the other indexing 
>>>> servers were supplied by this input in another phase to create binary 
>>>> indexes, either automatically, or by manual operation, called 
>>>> "suspend/resume indexing API". 
>>>>
>>>> The advantage was that data could be received permanently via API while 
>>>> FAST indexing could be stopped temporarily in order to balance between 
>>>> indexing and search performance on limited hardware.
>>>>
>>>> Do you think of something like that also for Elasticsearch? This 
>>>> architecture is possible to implement by a plugin.
>>>>
>>>> Jörg
>>>>
>>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <[email protected]> 
>>>> wrote:
>>>>
>>>>> Hi
>>>>> Is there a way we can decouple data and associated mapping/indexing in 
>>>>> Elasticsearch itself.
>>>>> Basically store the raw data as source( json or some other format)  
>>>>> and various mapping/index can be used on top of that.
>>>>> I understand that one can use an outside database or file system, but 
>>>>> can it be natively achieved in ES itself.
>>>>>
>>>>> Basically we are trying to see how our ES instance will work when we 
>>>>> have to change mapping of existing and continuously incoming data without 
>>>>> any downtime for the end user.
>>>>> We have an added wrinkle that our indexing has to be edit aware for 
>>>>> versioning purpose; unlike ES where each edit is a new record.
>>>>> regards and thanks
>>>>> amish
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Decoupling Data and indexing

Reply via email to