i know im a little late replying to this thread, but, in my humble opinion the best way to aggregate values (not necessarily terms, but whole values in fields) is as follows: startup stage: for each field you would like to aggregate create a hashmap open an index reader and run through all the docs get the values to be aggregated from the fields of each doc create a hashcode for each value from each field collected, the hashcode should have some sort of prefix indicating which field its from (for exampe: 1 = author, 2 = ....) and hence which hash it is stored in (at retrieval time, this prefix can be used to easily retrieve the value from the correct hash) place the hashcode/value in the appropriate hash create an arraylist at index X in the arraylist place an int array of all the hashcodes associated with doc id X so for example: if i have doc id 0 which contains the values: william shakespeare and the value 1797 the array list at index 0 will have an int array containing 2 values (the 2 hashcodes of shaklespeare and 1797) run time: at run time receive the hits and iterate through the doc ids , aggregate the values with direct access into the arraylist (for doc id 10 go to index 10 in the arraylist to retrieve the array of hashcodes) and lookups into the hashmaps i tested this today on a small index approx 400,000 docs (1GB of data) but i ran queries returning over 100,000 results my response time was about 550 milliseconds on large (over 100,000) result sets another point, this method should be scalable for much larger indexes as well, as it is linear to the result set size and not the index size (which is a HUGE bonus) if anyone wants the code let me know,
Marvin Humphrey <[EMAIL PROTECTED]> wrote: Thanks, all. The field cache and the bitsets both seem like good options until the collection grows too large, provided that the index does not need to be updated very frequently. Then for large collections, there's statistical sampling. Any of those options seems preferable to retrieving all docs all the time. Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Feel free to call! Free PC-to-PC calls. Low rates on PC-to-Phone. Get Yahoo! Messenger with Voice