Thanks Jorg, make sense. Few minor questions : a) With the current ES architecture is this the best/recommended way? b) Is there any project in roadmap to provide more support for it.
regards and thanks amish On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote: > > FAST stored the source data in distributed machines, only the control API > was not distributed (similar to ES HTTP curl requests, which also connect > to one host only). > > Of course you could index raw JSON to a preparer index with a single > field, _all disabled, and field set to "not indexed" so there is no Lucene > activity on it. This preparer index could also hold mappings in special > documents for the indexing runs. > > The data duplication factor depends on the complexity of the mapping(s), > and the characteristics of the data (dictionary size, analyzer / tokenizer > output, norms etc.) > > A plugin would do no magic at all, it could bundle the calls that > otherwise a client would have to execute from remote, and adds some > convenience commands for managing the prepare stage (e.g. suspend/resume) > and showing the current state of indexing. > > If redundant data is a no-go, then the whole approach is counterintuitive. > > Jörg > > > On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <[email protected] > <javascript:>> wrote: > >> With existing Elastic Search I can think of an architecture like this. >> >> Index : indexForDataDump : No mapping(Is it possible?) or minimum >> mapping. Use only to dump data from external system. There is some primary >> key. >> >> There are different search indexes with different mapping : >> search-index1, search-index2 etc. >> These indexes get populated from the indexForDataDump using technique >> mentioned here >> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>. >> So this way I can drop the search index as desired and create new one >> with new mapping. >> Any pros/cons or issue with this approach? There will be data duplication >> but I am hoping its minimum. ( Any way to quantify it?) >> >> regards and thanks >> amish >> >> >> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote: >>> >>> I am not aware of FAST but the idea looks promising. >>> However it might not be that easy to just have plugin for ES, as the >>> data itself is distributed on different machines. >>> So it will not be possible to have just one server with the data, as it >>> will become single point of failure. >>> regards and thanks >>> amish >>> >>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote: >>>> >>>> I know from the FAST Search engine ten years ago there was a two-phase >>>> commit for distributed search and indexing. One server could listen on the >>>> API and keep the (compressed) input stored, and all the other indexing >>>> servers were supplied by this input in another phase to create binary >>>> indexes, either automatically, or by manual operation, called >>>> "suspend/resume indexing API". >>>> >>>> The advantage was that data could be received permanently via API while >>>> FAST indexing could be stopped temporarily in order to balance between >>>> indexing and search performance on limited hardware. >>>> >>>> Do you think of something like that also for Elasticsearch? This >>>> architecture is possible to implement by a plugin. >>>> >>>> Jörg >>>> >>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <[email protected]> >>>> wrote: >>>> >>>>> Hi >>>>> Is there a way we can decouple data and associated mapping/indexing in >>>>> Elasticsearch itself. >>>>> Basically store the raw data as source( json or some other format) >>>>> and various mapping/index can be used on top of that. >>>>> I understand that one can use an outside database or file system, but >>>>> can it be natively achieved in ES itself. >>>>> >>>>> Basically we are trying to see how our ES instance will work when we >>>>> have to change mapping of existing and continuously incoming data without >>>>> any downtime for the end user. >>>>> We have an added wrinkle that our indexing has to be edit aware for >>>>> versioning purpose; unlike ES where each edit is a new record. >>>>> regards and thanks >>>>> amish >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
