Rivers are reimporting data at each ElasticSearch restart

Stéphane Seng Wed, 25 Jun 2014 08:40:40 -0700

Hello,

I have a question about the fact that, when rivers are used to import data 
into ElasticSearch, rivers are also reimporting data at each ElasticSearch 
restart.

In our project, what we are doing is as follows :

- Raw data is imported into ElasticSearch from a MySQL database using
the JDBC river (https://github.com/jprante/elasticsearch-river-jdbc);
- Some updates are executed directly on the newly imported data in
ElasticSearch using POST requests;
- In the end, the final data stored in ElasticSearch is not the same
than the imported raw data.

The problem we are facing is that when ElasticSearch is restarted, the JDBC
river is reimporting the raw data thus overriding the transformations made.
We suppose that this is an intentional behavior from ElasticSearch rivers.
One solution to avoid the reimporting of data is to delete the
corresponding _river index, which is supposed to store the state of the
rivers.

Our questions are as follows :

- Is the reimporting of data from rivers at each restart is a standard
use case ? Is it useful for some applications ?
- What is the point of the _river index state saving ?
- Is there a way to avoid the reimporting of data without having to
delete the corresponding _river index ?
- Is there any downsides (for our use case) to delete the
corresponding _river index ?

Thanks,
Stéphane.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rivers are reimporting data at each ElasticSearch restart

Reply via email to