On Wed, Apr 29, 2009 at 3:24 PM, Wouter Samaey <wouter.sam...@gmail.com> wrote: > Hi there, > > I'm currently in the process of learning more about Solr, and how I > can implement it into my project. > > Since my database is very large and complex, I'm looking into the way > of keeping my documents current in Solr. I have read the pages about > DIH, and find it usefull, but I may need more logic to filter out > documents or manipulate them. In order to use DIH, I'd need to run > huge queries and joins... > > Now, I see several ways of going forward: > > - customize DIH with a new classes so I can read directly from my > RDBMS (will be slow) > - let the webapp build an XML, and simply take that as a datasource > instead of the RDBMS (less queries, and can use memcached for the > heavy stuff) > - let the webapp instruct Solr to add, update or remove a document as > changes occur in real time instead of the DIH delta queries. For > loading a fresh situation, I'll still need to find a solution like the > ones above. (webapp drives solr directly, instead of DIH polling) > > Is there some general advice you can give? I understand every app is > different..but this must be an issue many have considered before. > > Kind regards > > Wouter Samaey > The disadvantage of DIH pulling data out of your db could be that complex queries take long. The best strategy as I see it is maintain a simple temp db where your app can write rows as you generate data. Periodically , ask DIH to read from this temp DB and update the index. This approach is good even even you wish to rebuild the index
-- --Noble Paul