Hello all: Doing development on the new ht://Dig 4.0, I was just
wondering how many people out there use an external parser to output individual
chunks of documents on individual lines (if you take a look at
http://htdig.org/attrs.html#external_parsers you’ll see the options).
Unfortunately, this “old” style is incompatible with how the
retriever now works. There is no way to handle these lines one at a time, since
documents are parsed all at once with the new UTF8 internal html parser (the
got_* functions are gone). To still support this method would require constructing an
html document from scratch and then handing it to the parser. Note that the
external _converters_ will still
work no problem, as long as they can be chained to either text/plain or
text/html. So, what do people think? Should ht://Dig continue to
support this? Is anyone actually using this option? Let me know. Anthony Arnone |
- [htdig] external parsers Arnone, Anthony
- Re: [htdig] external parsers G. T. Stresen-Reuter
- [htdig] external parsers Robert Isaac