Rob,

It seems I've got the matter of the problem.
In order to index *.pdf files via HTTP the added parser's "source mime" type
should be the same as "Content-Type" HTTP header field in web server replay.

Try to set "source mime" to "application/pdf" instead of "text/pdf" at the
"Parser" page.

>How can I check if the 'pdftotext.exe' is even being called?

If "pdftotext.exe" is being called and if log is "debug" then you should see
the following note:
"Starting external parser..."

Please send me your .conf file, I'll try it on monday.

Regz, Ramil.

----- Original Message -----
From: "Rob's Esmarts Account" <[EMAIL PROTECTED]>
To: "Ramil Kalimullin" <[EMAIL PROTECTED]>
Sent: Sunday, April 08, 2001 6:17 PM
Subject: Re: pdf


> Ramil,
>
> I've tried marking for reindex, clearing database, etc..  but am still not
> seeing any results show up within the 'dict' table.  How can I check if
the
> 'pdftotext.exe' is even being called?  I've added the mime type, parser,
and
> reran a number of times and still haven't been able to properly index a
.pdf
> files contents.  Please advise...
>
> Thanks,
> Rob Baigert
>
>
> ----- Original Message -----
> From: "Ramil Kalimullin" <[EMAIL PROTECTED]>
> To: "Rob's Esmarts Account" <[EMAIL PROTECTED]>
> Sent: Saturday, April 07, 2001 11:33 AM
> Subject: pdf
>
>
> > Clear your database or start "mark for reindex" before indexing.
> >
> > Ramil.
> >
> > > One other thing I'd like to mention - it seems as though the
> pdftotext.exe
> > > isnt being fired at all.. I would guess that when the indexer gets a
> page
> > /
> > > document that has the mime type of pdf, it fires the command that has
> been
> > > defined in the parser section.  It seems as though it just goes right
> over
> > > it, I would expect at least a small delay or something...
> > >
> > > Rob Baigert
> > > ----- Original Message -----
> > > From: "Rob's Esmarts Account" <[EMAIL PROTECTED]>
> > > To: "Ramil Kalimullin" <[EMAIL PROTECTED]>
> > > Sent: Saturday, April 07, 2001 7:29 AM
> > > Subject: Re: Webboard: web bug in search.htm ?
> > >
> > >
> > > > Hi, your right.. my mistake ( typo )  anyway, I did just that and
> turned
> > > on
> > > > logging.  Here is the result I got:
> > > >
> > > > START Saturday, April 07, 2001 07:37:41
> > > >
> > > > Indexing... (1 threads)
> > > > http://augusta/robots.txt
> > > > Server 'http://augusta/pdf/'
> > > > Allow NoCase  *
> > > > HTTP/1.1 404 Object Not Found
> > > > Server: Microsoft-IIS/5.0
> > > > Date: Sat, 07 Apr 2001 11:28:08 GMT
> > > > Content-Length: 3243
> > > > text/html
> > > > HTTP/1.1 404 Object Not Found text/html 3243
> > > > Deleting URL
> > > > http://augusta/pdf/wsu-72.pdf
> > > > Server 'http://augusta/pdf/'
> > > > Allow NoCase  *.pdf
> > > > HTTP/1.1 304 Not Modified
> > > > Server: Microsoft-IIS/5.0
> > > > Date: Sat, 07 Apr 2001 11:28:08 GMT
> > > > ETag: "0a69f7a335ec01:88e"
> > > > Content-Length: 0
> > > > HTTP/1.1 304 Not Modified ? 0
> > > > Done
> > > >
> > > > FINISH Saturday, April 07, 2001 07:37:42
> > > >
> > > > Does this look correct?  The only reason I ask is because I'm still
> not
> > > > seeing the results of the .pdf document in the 'dict' table.  There
is
> > > > nothing...
> > > >
> > > > Thanks,
> > > > Rob Baigert
> >
> >
> >
>
>

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to