#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+---------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  David Smith
         Type:  Bug            |                   Status:  assigned
    Component:  Utilities      |                  Version:  dev
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+---------------------------------------
Changes (by Carlton Gibson):

 * cc: Matthias Kestenholz (added)


Comment:

 Adding some detail after the last post, since you're looking at it David.

 There was a discussion (with various folks from html5lib, and Mozilla, and
 ...) about whether html5lib could be put on a better footing.
 I'm not sure how that panned out in the medium term. (I didn't check what
 the rhythm looks like now.)

 There was alternate talk about whether bleach (or an alternate) could
 build off `html5ever` which is the HTML parser from the Mozilla servo
 project.

 * https://github.com/servo/html5ever
 * https://github.com/SimonSapin/html5ever-python (Py03 bindings.)

 That would be pretty cool, but it was clearly a lot of work, and then 2020
 happened, so...

 The other candidate in this space in Matthias' html-sanitizer:
 https://github.com/matthiask/html-sanitizer — which is built on `lxml`.



 That's just to lay down the notes I had gathered. I'm not sure the way
 forward, but hopefully it's helpful.
 Very open to ideas though! Thanks for picking it up.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:19>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070185787661d1-c7e57374-c41b-4001-b6a7-181273e417c2-000000%40eu-central-1.amazonses.com.

Reply via email to