According to Ismael Olea:
> Gilles Detillieux escribi�:
> > Seriously, the main page at that URL does mention it.  If you scroll down
> > to the Features section, it says:
> > 
> >     - Searching of HTML and text files
> >         Both HTML documents and plain text files can be searched.
> >         Searching of other file types will be supported in future versions.
> 
>       htdig can handle sgml files too? And, can it manage meta tags in html
> files?

No, I don't think it can handle SGML.  I'm not familiar with SGML, but my
understanding is that a lot of its tags are quite different than HTML's.
Also, the http server would likely assign a different content-type to
SGML documents, so htdig won't even attempt to parse them.

Meta tags in HTML are supported by htdig.

> > That's not quite the whole story, though.  There is some support for
> > PDF documents right now, if you have acroread (Adobe Acrobat Reader) on
> > your system.  Also, with external parsers, you can index a whole lot more.
> 
>       This external parsers must be htdig aware or can be unix-like? Where
> can I find they?
>  
> > The parse_doc.pl script in ht://Dig 3.1.1's contrib directory can handle
>       
>       Looks very interesting.

External parsers must definitely be htdig aware.  Their output must adhere
to the format specified in the documentation.  See

        http://www.htdig.org/attrs.html#external_parsers

for details.  The parse_doc.pl script, and its earlier versions as perl
and shell scripts, is the only external parser around that's publically
available, as far as I know.  Someone on the list can correct me if I'm
wrong.  parse_doc.pl is also a good starting point if you want to set
up an interface between htdig and any number of more Unix-like document
parsers.  Any filter that can extract plain text from a document can
easily be plugged into this script, and it handles the generation of
records for htdig.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to