W3C Compliant HTML Parser to replace current std/htmlparser

nrk Sat, 08 Jul 2023 01:15:19 -0700

I am indeed planning to isolate Chawan's html5 parser into a separate library. 
Right now I'm evaluating the best way to write an API that doesn't involve 
bringing in half of Chawan as a dependency; preferably it would work similarly 
to [html5ever](https://github.com/servo/html5ever), so you could supply your 
own DOM implementation. (Eventually the library could provide a basic DOM 
skeleton for ease of use.)


Not sure if putting it in the stdlib is the best idea, with the tokenizer it's 
like 4k lines of code. That's quite the liability for maintainers, especially 
when they are trying to slim down the stdlib. (Not to mention it depends on 
Chawan's 
[decoderstream](https://git.sr.ht/~bptato/chawan/tree/master/item/src/encoding/decoderstream.nim),
 which is again a hell to integrate.) In short, I would rather make it a 
separate library.

W3C Compliant HTML Parser to replace current std/htmlparser

Reply via email to