Hello Gary,

2014-02-01 Gary Gregory <[email protected]>:

> On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <[email protected]>
> wrote:
>
> > Hi,
> >
> > right now we have the following methods in StringEscapeUtils:
> >
> > escapeXml(String
> > escapeHtml3(String)
> > escapeHtml4(String)
> >
> > These methods only escape the basic xml/html entities, though they may
> > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that
> > only produce valid XML, they should throw an exception if a character is
> > encountered that cannot be displayed in XML (not even by escaping).
> >
>
> How does that the problem mentioned earlier on the ML of needing valid XML
> no matter what the input?
>

I don't understand that sentence, sorry :o)


>
> There are several tasks for the API(s):
>
> - Escaping (implied by the API name)
> - Dealing with non-XML chars:
>   o Strip, or
>   o Throw exception
>
> The simplest solution using today's style would be:
>
> escapeXml10(String text, boolean strip)
> escapeXml11(String text, boolean strip)
>
> strip true - strips
> strip false - throws exception
>

A boolean flag that controls whether a method throws an exception or not?
An exceptional situation is nothing that is configurable, imho.


>
> What I am not sure on is why you would want an exception or what you'd do
> with it.
>
> Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense
> because we really cannot handle the text. But what would the app then do
> with the exception? I am not sure that I want the extra logic. Presumably,
> if I am not using JAXB then I am doing my own "looser" XML IO, so I need to
> escape content... I wonder what JAXB does here...
>

As far as I know there is no way to embed the characters into XML. But I
may be wrong. I couldn't find something about this in the spec [1]. So
maybe we should go with stripping?


>
>
> >
> > Since the set of valid characters differs between XML 1.0 and XML 1.1, we
> > need two methods:
> >
> > escapeXml_1_0(String)
> > escapeXml_1_1(String)
> >
>
> Yuck! Underscores are of last resort.
>
> Simple alternatives
>
> escapeXml10
> escapeXml11
> escapeXmlV10
> escapeXmlV11
>
> Until we get to XML version 10, this will be fine.
>
> Precise alternatives:
>
> escapeXml10_20081126 (the W3C REC for XML 1.0 *5th edition* is is
> http://www.w3.org/TR/2008/REC-xml-20081126/)
> escapeXml10_20060816 (the W3C REC for XML 1.0 *4th edition* is is
> http://www.w3.org/TR/2008/REC-xml-20060816/)
> escapeXml10_20040204 (the W3C REC for XML 1.0 *3th edition* is is
> http://www.w3.org/TR/2008/REC-xml-20040204/)
>
> Or use a "E" or "e" for Edition instead of _
> escapeXml10E20081126
> escapeXml10e20081126
>
> Each edition may have several versions BTW.
>

I guess we should keep it simple then and go with escapeXml10 and
escapeXml11.


>
>
> >
> > To clarify the behavior of the old method I've created LANG-963 [2]. The
> > idea is to rename escapeXml(String) to escapeXmlEntities(String) and
> > deprecate the old method.
> >
> > Now I'm tempted to rename the HTML counterparts as well leading to either
> > of the following:
> >
> > escapeHtml3Entities(String)
> > escapeHtml4Entities(String)
> >
> > or:
> >
> > escapeHtml_3_Entities(String)
> > escapeHtml_4_Entities(String)
> >
> > or:
> >
> > escapeHtml_3_0_Entities(String)
> > escapeHtml_4_0_Entities(String)
> >
> > I find neither of the three very appealing, but for code symmetry we
> should
> > change this as well. Which one would you prefer?
> >
> > Benedikt
> >
> > P.S.: I'm planning to redesign great parts of the API. The "static util"
> > pattern is out dated and it is better to encode the information we're
> > trying to express here via fluent API. My proposal for lang 4.0 would be:
> >
> > StringEscaping.escape(str).with(Escaping.HTML_4_0)
> > StringEscaping.escape(str).with(Escaping.XML_ENTITIES)
> >
>
> Gross, don't force an API style on me, Java is verbose enough as it is. For
> those in love with fluent APIs, you can provide an separate code path I
> suppose. I'd rather not deal with it for low level util call sites. I am
> not building an object model here.
>
> Now that Java 8 lambdas are here, the style will change again.
>

I don't see why

SuperUtils.staticMethod(param1, param2, param3, param4)

isn't forcing an API style, but doing things differently (or should I say
"less 1998 style" ;-) is. But let's discuss this when the time for 4.0
comes.

Benedikt

[1] http://www.w3.org/TR/2006/REC-xml-20060816/#charsets


>
>
> >
> > This way we don't have to encode everything into method names.
>
>
> You still can use parameters. But first we need to decide on
> strip/exception policies.
>
> Gary
>
>
>
> > I've created
> > LANG-964 [3] for this.
> >
> > [1] https://issues.apache.org/jira/browse/LANG-955
> > [2] https://issues.apache.org/jira/browse/LANG-963
> > [3] https://issues.apache.org/jira/browse/LANG-964
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
>
>
>
> --
> E-Mail: [email protected] | [email protected]
> Java Persistence with Hibernate, Second Edition<
> http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Reply via email to