Thanks again Erick for taking the time. I agree that the CachingWrapperFilter as described under "using a custom filter" in LIA is probably my best bet. I wanted to check if anything had been added in Lucene releases since the book was written I wasn't aware of.
Cheers again. --- Erick Erickson <[EMAIL PROTECTED]> wrote: > You were probably right. See below.... > > On 9/25/06, Paul Lynch <[EMAIL PROTECTED]> wrote: > > > > Thanks for the quick response Erick. > > > > "index the documents in your preferred list with a > > field and index your non-preferred docs with a > field > > subid?" > > > > I considered this approach and dismissed it due to > the > > actual list of preferred ids changing so > frequently > > (every 10 mins...ish) but maybe I was a little > hasty > > in doing so. I will investigate the overhead in > > updating all docs in the index each time my list > > refreshes. I had assumed it was too prohibitive > but I > > know what they say about assumptions :) > > > Lots of overhead. There's really no capability of > updating a doc in place. > This has been on several people's wish-list. You'd > have to delete every doc > that you wanted to change and re-add it. I don't > know how many documents > this would be, if just a few it'd be OK, but if > many.... I was assuming (and > I *do* know what they say about assumptions <G>) > that you were just adding > to your preferred doc list every few minutes, not > changing existing > documents.... > > It really does sound like you want a filter. I was > pleasantly surprised by > how very quickly a filters are built. You could use > a CachingWrapperFilter > to have the filter kept around automatically (I > guess you'd only have one > per index update) to minimize your overhead for > building filters, and > perhaps warm up your cache by firing a canned query > at your searcher when > you re-open your IndexReader after index update. I > think you'd have to do > the two-query thing in this case. If you wanted to > really get exotic, you > could build your filter when you created your index > and store it in a *very > special document* and just read it in the first time > you needed it. Although > I've never used it, I guess you can store binary > data. From the Javadoc > > *Field<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20byte%5B%5D,%20org.apache.lucene.document.Field.Store%29> > *(String > <http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html> > name, > byte[] value, > Field.Store<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.Store.html> > store) > Create a stored field with binary value. > > The only thing here is that the filters (probably > wrapped in a > ConstantScoreQuery) lose relevance, but since you're > sorting "one of several > ways", that probably doesn't matter. > > Best > Erick > > > > Should I be able to make this workable, the beauty > of > > this solution would be that I would actually only > need > > to query once. If I had a field which indicates > > whether it is a preferred doc or not, "all" I will > > have to do is sort across the two fields. > > > > Thanks again Erick. Any other suggestions are most > > welcome. > > > > Regards, > > Paul > > > > --- Erick Erickson <[EMAIL PROTECTED]> > wrote: > > > > > OK, a really "off the top of my head" response, > but > > > what the heck.... > > > > > > I'm not sure you need to worry about filters. > Would > > > it work for you to index > > > the documents in your preferred list with a > field > > > (called, at the limit of > > > my creativity, preferredsubid <G>) and index > your > > > non-preferred docs with a > > > field subid? You'd still have to fire two > queries, > > > one on subid (to pick up > > > the ones in your non-preferred list) and one on > > > preferredsubid. > > > > > > Since there's no requirement that all docs have > the > > > same fields, your > > > preferred docs could have ONLY the > preferredsubid > > > field and your > > > non-preferred docs ONLY the subid field. That > way > > > you wouldn't have to worry > > > about picking the docs up twice. > > > > > > Merging should be simple then, just iterate over > > > however many hits you want > > > in your preferredHits object, then tack on > however > > > many you want from your > > > nonPreferredHits object. All the code for the > two > > > queries would be > > > identical, the only difference being whether you > > > specify "subid" or > > > "preferredsubid"...... > > > > > > I can imagine several variations on this > scenario, > > > but they depend on your > > > problem space. > > > > > > Whether this is the "best" or not, I leave as an > > > exercise for the reader. > > > > > > Best > > > Erick > > > > > > On 9/25/06, Paul Lynch <[EMAIL PROTECTED]> > wrote: > > > > > > > > Hi All, > > > > > > > > I have an index containing documents which all > > > have a > > > > field called SubId which holds the ID of the > > > > Subscriber that submitted the data. This field > is > > > > STORED and UN_TOKENIZED > > > > > > > > When I am querying the index, the user can > cloose > > > a > > > > number of different ways to sort the Hits. The > > > problem > > > > is that I have a list of SubIds that should > appear > > > at > > > > the top of the results list regardless of how > the > > > > index is sorted. In other words, lets suppose > the > > > Hits > > > > should be sorted by DateAdded, I require the > Hits > > > to > > > > be sorted by DateAdded for the SubIds in my > list > > > and > > > > then by DateAdded for the SubIds not in my > list. > > > > > > > > From reading previous discussions on the > mailing > > > list, > > > > I believe I could achieve what I need by > writing > > > > custom filters i.e. Run the query first with a > > > custom > > > > filter for the SubIds in my list and then a > second > > > > time with a custom filter for the SubIds not > in my > > > > list and then "merge" the results. > > > > > > > > I suppose my question is simple: Is there a > better > > > way > > > > to achieve this? > > > > > === message truncated === --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]