On Tue, 10 Feb 2004 14:31, Nick Rout wrote: > Does anyone have any real world experience in running htdig over a large > number of word documents. > > (Background - the Police in large cases now disclose the file to > defence counsel on cd (or 5 cd's in this case) full of ms word documents. > No index, no analysis, just sequentially numbered files full of > typewritten docs. If the cd's are full, thats 3G, it prints out to a > desk full of eastlight files.)
http://www.google.co.nz/search?q=managing+gigabytes&ie=ISO-8859-1&hl=en&btnG=Google+Search&meta= especially http://www.mds.rmit.edu.au/mg/intro/mgintro.html The MG (Managing Gigabytes) system is a collection of programs which comprise a full-text retrieval system. A full-text retrieval system allows one to create a database out of some given documents and then do queries upon it to retrieve any relevant documents. It is "full-text" in the sense that every word in the text is indexed and the query operates only on this index to do the searching. No first hand exp. but a friend waxed lyrical about the system. -- Sincerely etc. Christopher Sawtell NB. This PC runs Linux. If you find a virus apparently from me, it has forged the e-mail headers on someone else's machine. Please do not notify me when this occurs. Thanks.
