Re: [htdig] external parsers

G. T. Stresen-Reuter Sun, 09 Apr 2006 04:51:01 -0700

I'm not using it, but of course I do use external parsers.

On that subject, though, the only feature request that I would havewould be native support for PDF indexing since it is such a commonformat (and I've never been particularly satisfied with the results ofexternal parsers).


Ted Stresen-Reuter

On Apr 6, 2006, at 10:05 PM, Arnone, Anthony wrote:

Hello all:
 
Doing development on the new ht://Dig 4.0, I was just wondering howmany people out there use an external parser to output individualchunks of documents on individual lines (if you take a look athttp://htdig.org/attrs.html#external_parsers you’ll see the options).Unfortunately, this “old” style is incompatible with how the retrievernow works. There is no way to handle these lines one at a time, sincedocuments are parsed all at once with the new UTF8 internal htmlparser (the got_* functions are gone).
 
To still support this method would require constructing an htmldocument from scratch and then handing it to the parser. Note that theexternal _converters_ will still work no problem, as long as they canbe chained to either text/plain or text/html.
 
So, what do people think? Should ht://Dig continue to support this? Isanyone actually using this option? Let me know.
 
 
Anthony Arnone




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Re: [htdig] external parsers

Reply via email to