We faced what might be a similar problem not too long ago. Our app is supposed to allow for foldering -- i.e., a document may be in one or more folders that the user creates and populates by hand or via query. We used a simple btree database from Berkeley JE and used a hit collector to filter against that database when selecting results. We didn't go with an all-Lucene approach because the "foldering" is supposed to be responsive (the user should see the document in the folder within ~5 seconds) and we have large catalog sizes; in other words, we didn't want to modify and re-optimize the index very often. This also allowed us to do our own "per-field" stored field implementation: another Berkeley DB holds all our stored fields and the Lucene index only stores a single, small, non-Lucene document ID. We pull only the small document ID for the hit collector and only those fields needed for the results from Berkeley.
-j --- Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > Say I have N categories, each item is assigned to one or more categories. > And i want the search results being counted against each of the categories. > > I checked the Lucene in Action book, and there doesn't seem to be this > feature. So is there any plan to add binning to Lucene? > > It looks like this involves modifying part of the Lucene's implementation, > in that, we can: > > - specify which index field is used as the binning field. > - after we grab the doc-id list, we perform N intersections just to get the > count: each intersection is performed on the result doc-id list and the > doc-id list for all items assigned to a category. > > Is there any better approach to do that? or any optimizations to this? > > > thanks, > > -Hui > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]