Hello, This post interested me. Have we a way to know when indexing is finished and thus triggered the XDELETE _river?
Le mercredi 25 juin 2014 17:54:01 UTC+2, Jörg Prante a écrit : > > It is up to the river implementation how the data import is handled. > > The JDBC river, in the "simple" strategy, imports data when the river is > started, regardless of existing cluster or index. It is possible to > implement other strategies, for example, a strategy that performs a check > before indexing. > > There is no support for river implementations about node start/stop > control and how to behave. JDBC river tries to compensate this by > persisting a JDBC river specific state. This state is useful for flow > control. > > If you do no longer need the river, you can delete the river with curl > -XDELETE, this shuts down river instance threads gracefully and releases > resources. > > If you delete the _river index with curl -XDELETE, you wipe all data that > is used by rivers. Active river instances are not stopped and are not aware > of what happened, so this is an unfriendly way to terminate river runs, all > kind of river errors may occur. > > Jörg > > > > On Wed, Jun 25, 2014 at 5:38 PM, Stéphane Seng <[email protected] > <javascript:>> wrote: > >> Hello, >> >> I have a question about the fact that, when rivers are used to import >> data into ElasticSearch, rivers are also reimporting data at each >> ElasticSearch restart. >> >> In our project, what we are doing is as follows : >> >> - Raw data is imported into ElasticSearch from a MySQL database using >> the JDBC river (https://github.com/jprante/elasticsearch-river-jdbc); >> - Some updates are executed directly on the newly imported data in >> ElasticSearch using POST requests; >> - In the end, the final data stored in ElasticSearch is not the same >> than the imported raw data. >> >> The problem we are facing is that when ElasticSearch is restarted, the >> JDBC river is reimporting the raw data thus overriding the transformations >> made. >> We suppose that this is an intentional behavior from ElasticSearch rivers. >> One solution to avoid the reimporting of data is to delete the >> corresponding _river index, which is supposed to store the state of the >> rivers. >> >> Our questions are as follows : >> >> - Is the reimporting of data from rivers at each restart is a >> standard use case ? Is it useful for some applications ? >> - What is the point of the _river index state saving ? >> - Is there a way to avoid the reimporting of data without having >> to delete the corresponding _river index ? >> - Is there any downsides (for our use case) to delete the >> corresponding _river index ? >> >> Thanks, >> Stéphane. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2b7f91f1-4fa0-4e66-8193-cd0e6fa35982%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
