Hi,
I am trying to make it work on windows7 for PDF indexing.
All the database files are being generated but I see the following issues,
1) The db.docsdb is generated with pdf id but not with TItle.
2) The excrepts(H) attribute is missing from the db.docs file
3) The db.worddump is generated with junk charecters.
The db.docs and db.worddump files, I tried using the ones generated on linux
which worked fine but not the db.docsdb and db.docs.index files.
Please let me know what options I have?
I tested running perl sccripts doc2html and pdf2html and they are parsing my
pdf but only the local ones. They are not parsing when I pass the URL of the
pdf.
pdftotext and pdfinfo are working fine.
Also, how can index the pdfs in my local system directory.
I tried these options but it didn't work,
start_url: http://localhost/pdf/
#local_urls: http://localhost/pdf/ = C:/cygwin/var/www/htdocs/pdf/
#local_urls_only: true
Thanks for your help.
John
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general