Why not just add custom terms onto the end of each query for each user? i.e. When user X queries for "bananas", and has previously set their domains to search in cnn, and yahoo, then why not append the following onto the search query: "fullText:bananas AND (domain:cnn OR domain:yahoo)"
Off the top of my head there's a few caveats: 1) if the domain list is large, you'll have to deal with the maxbooleans setting 2) parsing the query can be slow, however, there's a tradeoff between managing thousands of indexes vs a slight performance hit (Or, you can put the query together without parsing - depends on how you handle the users query terms) This seems like too simple an approach, I'm sure I'm not understanding something... LH On Fri, Jan 8, 2010 at 5:16 AM, Yaniv Ben Yosef <yani...@gmail.com> wrote: > Thanks Otis, that's very helpful. > > On Fri, Jan 8, 2010 at 2:08 AM, Otis Gospodnetic < > otis_gospodne...@yahoo.com > > wrote: > > > Ah, well, masking it didn't help. Yes, ignore Bixo, Nutch, and Droids > > then. > > Consider DataImportHandler from Solr or wait a bit for Lucene Connectors > > Framework to materialize. Or use LuSql, or DbSight, or Sematext's > Database > > Indexer. > > > > Yes, I was suggesting a separate index for each user. That's what Simpy > > uses and has some 200K indices on 1 box.... and I think dozens of QPS > > without any caching, if I remember correctly. Load is under 1.0. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > > > > ----- Original Message ---- > > > From: Yaniv Ben Yosef <yani...@gmail.com> > > > To: java-user@lucene.apache.org > > > Sent: Thu, January 7, 2010 6:55:18 PM > > > Subject: Re: Implementing filtering based on multiple fields > > > > > > Thanks Otis. > > > > > > If I understand correctly - Bixo, Nutch and Droids are technologies to > > use > > > for crawling the web and building an index. My project is actually > about > > > indexing a large database, where you can think of every row as a web > > page, > > > and a particular column is the equivalent of a web site. (I didn't > > mention > > > that in the previous post because I didn't want to complicate my > > question, > > > and it seems equivalent to Google CSE given that Lucene can use > virtually > > > any input for indexing, AFAIK) > > > Therefore I'm not sure if the frameworks you've mentioned are > applicable > > to > > > my project as they seem to be related to web page indexing, but perhaps > > I'm > > > missing something. > > > Also, what did you mean about isolating users and their data/indices. > Did > > > you mean that I should create a separate index per user? > > > > > > Thanks again! > > > > > > On Fri, Jan 8, 2010 at 12:35 AM, Otis Gospodnetic < > > > otis_gospodne...@yahoo.com> wrote: > > > > > > > For something like CSE, I think you want to isolate users and their > > > > data/indices. > > > > > > > > I'd look at Bixo or Nutch or Droids ==> Lucene or Solr > > > > > > > > Otis > > > > -- > > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: Yaniv Ben Yosef > > > > > To: java-user@lucene.apache.org > > > > > Sent: Thu, January 7, 2010 3:54:22 PM > > > > > Subject: Implementing filtering based on multiple fields > > > > > > > > > > Hi, > > > > > > > > > > I'm very new to Lucene. In fact, I'm at the beginning of an > > evaluation > > > > > phase, trying to figure whether Lucene is the right fit for my > needs. > > > > > The project I'm involved in requires something similar to the > Google > > > > Custom > > > > > Search Engine (CSE). In CSE, each user can > > > > > define a set (could be a large set) of websites, and limit the > search > > to > > > > > only those websites. So for example, I can create a CSE that > searches > > all > > > > > web pages on cnn.com, msnbc.com and nytimes.com only. > > > > > I am trying to understand whether and how I can do something > similar > > in > > > > > Lucene. > > > > > > > > > > The FAQ hints about this possibility > > > > > here, > > > > > but it mentions a class that no longer exists in 3.0 (QueryFilter), > > and > > > > is > > > > > very laconic about the suggested options. Also I'm not sure how > well > > it > > > > will > > > > > perform in my use case (or even if it fits at all). > > > > > I thought about creating a separate index for each user or CSE. > > However, > > > > my > > > > > system should be able to handle tens of thousands of concurrent > > users. I > > > > > haven't done any analysis yet on how this will affect CPU, RAM, I/O > > and > > > > > storage size, but was wondering if any of you experienced Lucene > > > > > users/developers think it's a good direction. > > > > > If that's not a good idea, what would be a good strategy here? > > > > > > > > > > Any help will be much appreciated, > > > > > Yaniv > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >