On Tue, Dec 02, 2003 at 01:54:58PM +0000, jt oob wrote: > Hi, > > I have just indexed a lot of news (nntp) postings. > I now have an index for each topic (a topic can have many newsgroups) > > The index sizes are: > > 2.6G Current Affairs > 2.4G Celebs > 119M Recreation > 3.0M Tech - Mac > 2.4G Tech - Windows > 936M Tech - Linux > 702M Tech - Other > 96M Tech - Consoles
Around 15 gigs. How many days of news? > > This is still only early stages so i haven't yet done any parsing, just > treating each doc as plain text. > > Originally I was merging all these indexes together, but this is now > not feasible with new additions being made to each index as new > postings arrive. > I optimize each index at midnight. > > What is the best way to allow users to query either just one index, or > the whole lot? Probably, create a IndexSearcher for each index and then use a MultiSearcher to search them all together. It'll probably use quite a bit of memory. > > My prototype was making a system call from and running my java program > to print all the results to the screen. I know this isn't the best way > to do it :-) > > I guess I need to write a server and periodically re-open the indexes > to see any changes? > > Thank you for any help! > > jt > > ________________________________________________________________________ > Download Yahoo! Messenger now for a chance to win Live At Knebworth DVDs > http://www.yahoo.co.uk/robbiewilliams > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
