Geoff Hutchison wrote:
> 
> At 2:15 PM +0300 4/26/00, Peter L. Peres wrote:
> >   I's me again ;-) Has anyone tried to index a C/java/C++/ASM source tree
> >using htdig ? Perhaps by placing a list of menemonics and reserved words
> >in the bad word list ?

For C/C++/Java it should be quite easy to write a lex/yacc parser which
eliminates reserved words, operators and other "noise" characters.  In
addition, such a parser could globally declared functions and variables
to <H> tags.

There should be some source->html converters somewhere at freshmeat,
which
already do some nice markup.  Either plugging such a converter into the
web-server for converting plain source files on-the-fly or having such
a tool (perhaps with little modifications) generate input for the digger
should be no problem.


> >   Is there some support for parsing dvi and ps files ? dvi can be turned
> >into (ugly) text using dvi2ascii and there is a corresponding converter
> >for ps.
> 
> I would check the conv_doc.pl script and plug in a dvi->txt
> converter. I believe it already handles PostScript files nicely.

Perhaps it is easier (and better, although slower) to convert dvi->ps
and use the PostScript feature of conv_doc.pl - dvi2ascii and similar
might lead to some unwanted effects with regards to embedded graphics,
which probably cause a lot of noise in the document database (the
excerpts will contain lots of dashes/vertical bars etc for rulers).


cheers,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to