On Wed, Apr 29, 2009 at 3:24 PM, Wouter Samaey <wouter.sam...@gmail.com> wrote:
> Hi there,
>
> I'm currently in the process of learning more about Solr, and how I
> can implement it into my project.
>
> Since my database is very large and complex, I'm looking into the way
> of keeping my documents current in Solr. I have read the pages about
> DIH, and find it usefull, but I may need more logic to filter out
> documents or manipulate them. In order to use DIH, I'd need to run
> huge queries and joins...
>
> Now, I see several ways of going forward:
>
> - customize DIH with a new classes so I can read directly from my
> RDBMS (will be slow)
> - let the webapp build an XML, and simply take that as a datasource
> instead of the RDBMS (less queries, and can use memcached for the
> heavy stuff)
> - let the webapp instruct Solr to add, update or remove a document as
> changes occur in real time instead of the DIH delta queries. For
> loading a fresh situation, I'll still need to find a solution like the
> ones above. (webapp drives solr directly, instead of DIH polling)
>
> Is there some general advice you can give? I understand every app is
> different..but this must be an issue many have considered before.
>
> Kind regards
>
> Wouter Samaey
>
The disadvantage of DIH pulling data out of your db could be that
complex queries take long. The best strategy as I see it is maintain a
simple temp db where your app can write rows as you generate data.
Periodically , ask DIH to read from this temp DB and update the index.
This approach is good even even you wish to rebuild the index


-- 
--Noble Paul

Reply via email to