Rob, It seems I've got the matter of the problem. In order to index *.pdf files via HTTP the added parser's "source mime" type should be the same as "Content-Type" HTTP header field in web server replay. Try to set "source mime" to "application/pdf" instead of "text/pdf" at the "Parser" page. >How can I check if the 'pdftotext.exe' is even being called? If "pdftotext.exe" is being called and if log is "debug" then you should see the following note: "Starting external parser..." Please send me your .conf file, I'll try it on monday. Regz, Ramil. ----- Original Message ----- From: "Rob's Esmarts Account" <[EMAIL PROTECTED]> To: "Ramil Kalimullin" <[EMAIL PROTECTED]> Sent: Sunday, April 08, 2001 6:17 PM Subject: Re: pdf > Ramil, > > I've tried marking for reindex, clearing database, etc.. but am still not > seeing any results show up within the 'dict' table. How can I check if the > 'pdftotext.exe' is even being called? I've added the mime type, parser, and > reran a number of times and still haven't been able to properly index a .pdf > files contents. Please advise... > > Thanks, > Rob Baigert > > > ----- Original Message ----- > From: "Ramil Kalimullin" <[EMAIL PROTECTED]> > To: "Rob's Esmarts Account" <[EMAIL PROTECTED]> > Sent: Saturday, April 07, 2001 11:33 AM > Subject: pdf > > > > Clear your database or start "mark for reindex" before indexing. > > > > Ramil. > > > > > One other thing I'd like to mention - it seems as though the > pdftotext.exe > > > isnt being fired at all.. I would guess that when the indexer gets a > page > > / > > > document that has the mime type of pdf, it fires the command that has > been > > > defined in the parser section. It seems as though it just goes right > over > > > it, I would expect at least a small delay or something... > > > > > > Rob Baigert > > > ----- Original Message ----- > > > From: "Rob's Esmarts Account" <[EMAIL PROTECTED]> > > > To: "Ramil Kalimullin" <[EMAIL PROTECTED]> > > > Sent: Saturday, April 07, 2001 7:29 AM > > > Subject: Re: Webboard: web bug in search.htm ? > > > > > > > > > > Hi, your right.. my mistake ( typo ) anyway, I did just that and > turned > > > on > > > > logging. Here is the result I got: > > > > > > > > START Saturday, April 07, 2001 07:37:41 > > > > > > > > Indexing... (1 threads) > > > > http://augusta/robots.txt > > > > Server 'http://augusta/pdf/' > > > > Allow NoCase * > > > > HTTP/1.1 404 Object Not Found > > > > Server: Microsoft-IIS/5.0 > > > > Date: Sat, 07 Apr 2001 11:28:08 GMT > > > > Content-Length: 3243 > > > > text/html > > > > HTTP/1.1 404 Object Not Found text/html 3243 > > > > Deleting URL > > > > http://augusta/pdf/wsu-72.pdf > > > > Server 'http://augusta/pdf/' > > > > Allow NoCase *.pdf > > > > HTTP/1.1 304 Not Modified > > > > Server: Microsoft-IIS/5.0 > > > > Date: Sat, 07 Apr 2001 11:28:08 GMT > > > > ETag: "0a69f7a335ec01:88e" > > > > Content-Length: 0 > > > > HTTP/1.1 304 Not Modified ? 0 > > > > Done > > > > > > > > FINISH Saturday, April 07, 2001 07:37:42 > > > > > > > > Does this look correct? The only reason I ask is because I'm still > not > > > > seeing the results of the .pdf document in the 'dict' table. There is > > > > nothing... > > > > > > > > Thanks, > > > > Rob Baigert > > > > > > > > ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
