Hello all:

 

Doing development on the new ht://Dig 4.0, I was just wondering how many people out there use an external parser to output individual chunks of documents on individual lines (if you take a look at http://htdig.org/attrs.html#external_parsers you’ll see the options). Unfortunately, this “old” style is incompatible with how the retriever now works. There is no way to handle these lines one at a time, since documents are parsed all at once with the new UTF8 internal html parser (the got_* functions are gone).

 

To still support this method would require constructing an html document from scratch and then handing it to the parser. Note that the external _converters_ will still work no problem, as long as they can be chained to either text/plain or text/html.

 

So, what do people think? Should ht://Dig continue to support this? Is anyone actually using this option? Let me know.

 

 

Anthony Arnone

Reply via email to