[ 
https://issues.apache.org/jira/browse/YETUS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468947#comment-15468947
 ] 

Allen Wittenauer commented on YETUS-457:
----------------------------------------

Keep in mind that release notes can have their format controlled by putting a 
header at the top of the field in JIRA.  RDM uses that to control what method 
is uses to convert the input to markdown.  Fields that are marked as markdown 
are expected to really be in proper markdown.  If they don't have the proper 
escaping, then that's on the input creator not on RDM to fix. (This was a point 
of discussion in one of the RDM JIRAs a while back.)

bq. markdown_sanitize isn't used outside of utils.py and doesn't do what I 
consider markdown sanitization.

When markdown_sanitize is called, it's used when the format is *already* in 
markdown format. The sanitization here is primarily for python, since it blows 
up if the input isn't in UTF-8 for certain routines.

bq. Should we just inline it into text_sanitize? 

No.  At some point, there might be a confluence_sanitize that takes 
JIRA/Confluence wiki markup and converts to markdown/python/doxia compatible 
output.  Or maybe even some other _sanitize like apt. By keeping it separate we 
make it clear what routines are used to clean what types of input.

bq. I don't follow the comment about additional Doxia escaping. Does this mean 
additional escaping for the Doxia flavor of markdown, or for Doxia's apt format?

Basically, when mvn site converts markdown to HTML via doxia, it gets tripped 
up on the extra metachars.  Some of these problem chars will even cause doxia 
to go into an endless loop.  Fun!

bq. I added the apt escaping since it seems harmless and is very similar to 
markdown's slash escaping, but ideally we handle escaping for different formats 
with different methods.

Now you get the point about having different _sanitize methods.  They are for 
exactly that: different escaping routines for different formats.

apt, btw, shouldn't be going through releasedocmaker at all.  If someone is 
writing apt format in their release notes, that's a whole other problem....

bq. I want to do all this with some third-party escaping library, but we have 
some weird escaping requirements. Not sure.

Yup.  Thanks maven....

> RDM does not properly escape entities
> -------------------------------------
>
>                 Key: YETUS-457
>                 URL: https://issues.apache.org/jira/browse/YETUS-457
>             Project: Yetus
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Critical
>         Attachments: YETUS-457.001.patch
>
>
> Noticed while browsing the Hadoop 3.0.0-alpha1 changelog. Quotes and possibly 
> some other entities are not escaped properly, leading to malformed markdown 
> output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to