Hi, I'm very new to Lucene. In fact, I'm at the beginning of an evaluation phase, trying to figure whether Lucene is the right fit for my needs. The project I'm involved in requires something similar to the Google Custom Search Engine <http://www.google.com/cse/> (CSE). In CSE, each user can define a set (could be a large set) of websites, and limit the search to only those websites. So for example, I can create a CSE that searches all web pages on cnn.com, msnbc.com and nytimes.com only. I am trying to understand whether and how I can do something similar in Lucene.
The FAQ hints about this possibility here<http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F>, but it mentions a class that no longer exists in 3.0 (QueryFilter), and is very laconic about the suggested options. Also I'm not sure how well it will perform in my use case (or even if it fits at all). I thought about creating a separate index for each user or CSE. However, my system should be able to handle tens of thousands of concurrent users. I haven't done any analysis yet on how this will affect CPU, RAM, I/O and storage size, but was wondering if any of you experienced Lucene users/developers think it's a good direction. If that's not a good idea, what would be a good strategy here? Any help will be much appreciated, Yaniv