Hallo, I'm using htdig for indexing an intranet site, several gigabytes of documents (almost all .doc and .pdf), several tens of thousand of files. Well the until now the size of db files are:
161809408 Nov 23 11:10 db.docdb 13569024 Nov 23 11:10 db.docs.index 295576571 Nov 23 11:05 db.wordlist 238107648 Nov 23 11:05 db.words.db and I'm at the 20% of the work. The pc is a [EMAIL PROTECTED] with 128mB of RAM. Can this hw do the job? I see that I can use mysql with htdig instead of berkeleydb, can it make the search faster? Where can I find information on using mysql with htdig? Another question, the directories tree is: year +-->month_1 +--> day_1 +--> day_2 ... +--> day_n +-->month_2 +--> day_1 +--> day_2 ... +--> day_n ... +-->month_n +--> day_1 +--> day_2 ... +--> day_n Into every day_x directory I have the same directories, so when I search for the word "giustizia" I obtain one entry for every diectory and for every file in directories called "giustizia" (now, after more then a minute of work, it returns 12.000 result), how can I manage this situation? Why htdig show me only 10 pages with 10 results per page? How can I see all results if I have more than 100? (not a strange situation if I scan 50.000 files) Apart from say to htdig to show more then 10 doc per page. Thanks, Pietro. -- I will build myself a copper tower With four ways out and no way in But mine the glory, mine the power (So I chose Amiga and GNU/Linux) ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ ht://Dig general mailing list: <htdig-general@lists.sourceforge.net> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general