I'm not using it, but of course I do use external parsers.

On that subject, though, the only feature request that I would have would be native support for PDF indexing since it is such a common format (and I've never been particularly satisfied with the results of external parsers).

Ted Stresen-Reuter

On Apr 6, 2006, at 10:05 PM, Arnone, Anthony wrote:

Hello all:
 
Doing development on the new ht://Dig 4.0, I was just wondering how many people out there use an external parser to output individual chunks of documents on individual lines (if you take a look at http://htdig.org/attrs.html#external_parsers you’ll see the options). Unfortunately, this “old” style is incompatible with how the retriever now works. There is no way to handle these lines one at a time, since documents are parsed all at once with the new UTF8 internal html parser (the got_* functions are gone).
 
To still support this method would require constructing an html document from scratch and then handing it to the parser. Note that the external _converters_ will still work no problem, as long as they can be chained to either text/plain or text/html.
 
So, what do people think? Should ht://Dig continue to support this? Is anyone actually using this option? Let me know.
 
 
Anthony Arnone



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to