Thanks Jorg. On Wednesday, November 12, 2014 12:23:06 AM UTC-8, Jörg Prante wrote: > > There is no current method to redirect indexing to a preparer index for > delayed indexing, while searching is still enabled. > > By using rivers, you can close the _river index, some rivers (not all) may > take this as an indicator to stop indexing unless the _river index is > reopened. I consider this as a workaround and not as a feature. > > From my understanding the most preferred method to implement delayed > indexing currently is to set up a durable message queue (like RabbitMQ and > logstash) for external document persistency. By stopping/starting and > reconfiguring the message queue, the data can be indexed wherever you like. > > If you like to see delayed indexing as a core feature in ES and not as a > plugin, then you should open an issue with the suggestion. To be honest I > assume this will be rejected in favor of a queue in front of ES, like > described in this blog post > > http://dopey.io/logstash-rabbitmq-tuning.html > > Jörg > > > On Tue, Nov 11, 2014 at 11:40 PM, Amish Asthana <[email protected] > <javascript:>> wrote: > >> Thanks Jorg, make sense. >> Few minor questions : >> a) With the current ES architecture is this the best/recommended way? >> b) Is there any project in roadmap to provide more support for it. >> >> regards and thanks >> amish >> >> On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote: >>> >>> FAST stored the source data in distributed machines, only the control >>> API was not distributed (similar to ES HTTP curl requests, which also >>> connect to one host only). >>> >>> Of course you could index raw JSON to a preparer index with a single >>> field, _all disabled, and field set to "not indexed" so there is no Lucene >>> activity on it. This preparer index could also hold mappings in special >>> documents for the indexing runs. >>> >>> The data duplication factor depends on the complexity of the mapping(s), >>> and the characteristics of the data (dictionary size, analyzer / tokenizer >>> output, norms etc.) >>> >>> A plugin would do no magic at all, it could bundle the calls that >>> otherwise a client would have to execute from remote, and adds some >>> convenience commands for managing the prepare stage (e.g. suspend/resume) >>> and showing the current state of indexing. >>> >>> If redundant data is a no-go, then the whole approach is >>> counterintuitive. >>> >>> Jörg >>> >>> >>> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <[email protected]> >>> wrote: >>> >>>> With existing Elastic Search I can think of an architecture like this. >>>> >>>> Index : indexForDataDump : No mapping(Is it possible?) or minimum >>>> mapping. Use only to dump data from external system. There is some primary >>>> key. >>>> >>>> There are different search indexes with different mapping : >>>> search-index1, search-index2 etc. >>>> These indexes get populated from the indexForDataDump using technique >>>> mentioned here >>>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/> >>>> . >>>> So this way I can drop the search index as desired and create new one >>>> with new mapping. >>>> Any pros/cons or issue with this approach? There will be data >>>> duplication but I am hoping its minimum. ( Any way to quantify it?) >>>> >>>> regards and thanks >>>> amish >>>> >>>> >>>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote: >>>>> >>>>> I am not aware of FAST but the idea looks promising. >>>>> However it might not be that easy to just have plugin for ES, as the >>>>> data itself is distributed on different machines. >>>>> So it will not be possible to have just one server with the data, as >>>>> it will become single point of failure. >>>>> regards and thanks >>>>> amish >>>>> >>>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote: >>>>>> >>>>>> I know from the FAST Search engine ten years ago there was a >>>>>> two-phase commit for distributed search and indexing. One server could >>>>>> listen on the API and keep the (compressed) input stored, and all the >>>>>> other >>>>>> indexing servers were supplied by this input in another phase to create >>>>>> binary indexes, either automatically, or by manual operation, called >>>>>> "suspend/resume indexing API". >>>>>> >>>>>> The advantage was that data could be received permanently via API >>>>>> while FAST indexing could be stopped temporarily in order to balance >>>>>> between indexing and search performance on limited hardware. >>>>>> >>>>>> Do you think of something like that also for Elasticsearch? This >>>>>> architecture is possible to implement by a plugin. >>>>>> >>>>>> Jörg >>>>>> >>>>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> Is there a way we can decouple data and associated mapping/indexing >>>>>>> in Elasticsearch itself. >>>>>>> Basically store the raw data as source( json or some other format) >>>>>>> and various mapping/index can be used on top of that. >>>>>>> I understand that one can use an outside database or file system, >>>>>>> but can it be natively achieved in ES itself. >>>>>>> >>>>>>> Basically we are trying to see how our ES instance will work when we >>>>>>> have to change mapping of existing and continuously incoming data >>>>>>> without >>>>>>> any downtime for the end user. >>>>>>> We have an added wrinkle that our indexing has to be edit aware for >>>>>>> versioning purpose; unlike ES where each edit is a new record. >>>>>>> regards and thanks >>>>>>> amish >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-399 >>>>>>> 1-4568-9891-018baf79ebae%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > >
-- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e38cd140-83bf-48a6-a9f8-c1e693d0d3be%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
