Hi Simon, I'm not sure if I already replied to this or not. Here are some thoughts. Distributed indexing: - you could take the Solr approach and have a Master indexing server that periodically takes snapshops and tells Slave servers "hey, come get the new stuff". The problem is that the Master is the single point of failure. - you could take a similar replication approach with DRDB (http://www.drbd.org/) or some such - you could accept new entries in one place but delegate the indexing to multiple instances of the GData server in parallel
As for searching, you could simply partition the traffic instead of partitioning the index. Not the same thing clearly, but it's probably simpler to do (throw a load balancer/proxy in front of the search servers). If you want to partition the index, you could simply employ some logic that specifies the maximal size of the index. Until that limit is reached you index to the current index. Once the limit is reached you start a new index, possibly on a new server if that is available, or you start a new index and migrate the closed index elsewhere. I imagine Yonik, Doug, and others will have other ideas, too. Otis ----- Original Message ---- From: Simon Willnauer <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Saturday, July 15, 2006 10:37:11 AM Subject: Gdata - Indexing feeds and entries Hi there, it has been quiet about Gdata the last 2 weeks but all the exams are done and uni has finished yesterday so next round can start up. OK what needs to be done, the gdata protocol describes a kind of a query language to query feed for full text search in defined xml elements and / or custom elements. For that purpose the stored, updated and deleted entries have to be reflected into the search component to be available for searching.The indexer component of the server has to notified about modification events to keep the index uptodate. I'm not yet sure how the fields / elements of the xml will be configured but I guess I will look for some ideas in solr or nutch and discuss that later. My first and main problem is pretty well know on this mailinglist, well I found lots of questions and suggestions via google but these discussions are quite a while ago. I was wondering if there are some new cognitions about distributed searching / indexing. The server should be able to run in clusters/ server farms so indexed data must be available on each server / machine. I thought about this for a while and all my ideas seem to be problematic in a certain way. i found this thread on the mailing list http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html which gives a lot of information about the problem I'm facing. It would be great if some of you experienced guys could give me information about your experience / solution to this problem. If you see any possibility to provide such a mechanism as a generic solution we could we could separate this as a new contrib project after SoC has finished e.g. detach it from gdata. thanks in advance for your help ;) Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]