Only a quick thought, it seems you have a large table in RDBMS, and there
are selectors ('where' condition) that control the dissemanition, what rows
of the table have to be selected for further processing.

A first thought is to pull the whole database into ES and build filtered
queries for the 'where' condition.

If the database table is steadily growing, it could be more feasible to
fetch them by timestamp, e.g. once a day or per hour.

Jörg




On Thu, Aug 7, 2014 at 6:19 PM, coder <[email protected]> wrote:

> Here goes my use case:
>
> Table t1 --- >
> id         a               b
> 123 somestring somestring
>
> Table t2  --->
> id           c                              d                          e
> 123 someIntegerCount somebooleanValue someString
> select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;
>
> which gives some rows as:
>
> id                 a              b                             c
>         d                              e
> 123      someString   someString    someIntegerCount  somebooleanValue
> someString
> Now, In my use case the values of c and d fields in table t2 keeps
> changing frequently. So, I index only those rows for which c field > 0 (as
> count keeps changing) and d field = 1 (which means either enabled or
> disabled).
>
> Now, First time indexing is done without any issues. Problem comes when I
> update these two fields and want ES to reindex the documents. Since there
> might be some documents for which earlier c field was 0 but now it is
> non-zero and similarly d field was 0 earlier but changed to 1. Now, I want
> ES ti reflect those changes.(I guess that what mongo river does and I
> expect mysql to work, automatic sync). Also, there will be few results
> which were coming earlier but not now. How to delete those docs from index ?
>
> How can I accomplish this ?
>
> I have tried to explain my problem in simplest manner by keeping things
> simple. Please ask questions if anything is not clear.
>
> Thanks
>
> On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:
>>
>> Can you give a minimal example of a query with the rows, and what rows
>> are deleted then, so we can work through the issue?
>>
>> The fundamental problem is that deleted rows in SQL are no longer
>> available for creating deletion requests and so they can not be tracked
>> over time - once they are gone, they are gone. The problem is known as
>> "stale data". This can be solved either at a bigger scope (by using time
>> windowed indexes where older indexes can be dropped) or by an extra DB
>> mechanism to provide the IDs of the deleted docs after they are deleted
>> (maybe by trigger), so they can be selected by JDBC plugin with a "select
>> _optype, _id" construction. Note, at a certain size, deleting single docs
>> in ES is not efficient.
>>
>> To sync data between DB and ES, JDBC plugin is probably not smart enough
>> (it is impossible to implement app-specific logic in JDBC plugin). So you
>> should also consider to write a middleware app with specific logic that
>> controls the deletions in the DB and after that deletes docs in ES.
>>
>> Jörg
>>
>>
>>
>>
>> On Thu, Aug 7, 2014 at 10:44 AM, coder <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm using jdbc mysql river plugin https://github.com/jprante/
>>> elasticsearch-river-jdbc for creating ES index. I have been able to
>>> index my documents successfully but I'm facing issues in updating/deleting
>>> indexed documents. My jdbc river is used with a sql query that uses
>>> multiple joins on tables and return results. These results are then indexed
>>> in ES. My problem is if I update some tables they will affect the results
>>> of that join query which should be reflected in ES index but ES index is
>>> not updating/deleting sql results from that join query. I found few threads
>>> where people are facing similar issue.
>>>
>>> http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-
>>> mysql-not-deleting-records
>>>
>>> I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more
>>> thread from author itself where he states that deletions are no longer
>>> supported. He tells two methods of tackling the issue. One is re indexing
>>> itself and second is using some sql queries to update/delete indexed
>>> documents.
>>> https://github.com/jprante/elasticsearch-river-jdbc/issues/202
>>>
>>> Can anyone please tell me how can I tackle the issue. How can I
>>> update/delete already indexed documents. Can anyone please elaborate on the
>>> second method of updating/deleting indexed documents.
>>>
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%
>>> 40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGF0%2BzKfmtxfM_yaVpG8%2BnhHXmUztNQOuOOJehUt9K6-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to