2014-02-03 Adam Hooper <[email protected]>: > On Sun, Feb 2, 2014 at 2:00 PM, Benedikt Ritter <[email protected]> > wrote: > > > > 2014-02-01 Gary Gregory <[email protected]>: > > > >> On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <[email protected]> > >> wrote: > >> > >> > > >> > These methods only escape the basic xml/html entities, though they may > >> > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods > that > >> > only produce valid XML, they should throw an exception if a character > is > >> > encountered that cannot be displayed in XML (not even by escaping). > >> > >> How does that the problem mentioned earlier on the ML of needing valid > XML > >> no matter what the input? > >> > > > > I don't understand that sentence, sorry :o) > > As the author of that patch, my two pence: > > It's impossible to encode some characters in XML -- especially XML > 1.0. That's because XML is a text-only format, so it only allows text. > (This inspired Microsoft, when it created its XML document formats, to > invent a new encoding scheme ("xstring", I think) that uses valid XML > characters to encode invalid ones. Luckily, that encoding scheme never > caught on outside of Microsoft-land.) > > While there's nothing _wrong_ with escapeXml as it stands right now > (i.e., the code agrees with the docs), I argue that it doesn't solve > the actual problem people are using it for: people want to escape > strings for inclusion in XML documents, and escapeXml does not do > that. > > I think escapeXml should not output invalid XML ever. > > Presumably encodeXml() is being used today for lots of XML documents, > and it already throws a brutal exception: a valid XML parser will > throw an exception when it reaches an invalid character. That speaks > to the severity of the problem (it makes that data very hard to get > at), and to the rarity of the problem (there haven't been many bug > reports about this). > > >> There are several tasks for the API(s): > >> > >> - Escaping (implied by the API name) > >> - Dealing with non-XML chars: > >> o Strip, or > >> o Throw exception > >> > >> The simplest solution using today's style would be: > >> > >> escapeXml10(String text, boolean strip) > >> escapeXml11(String text, boolean strip) > >> > >> strip true - strips > >> strip false - throws exception > >> > > > > A boolean flag that controls whether a method throws an exception or not? > > An exceptional situation is nothing that is configurable, imho. > > > >> What I am not sure on is why you would want an exception or what you'd > do > >> with it. > >> > >> Are these 'bad chars' embeddable in a CDATA? If so, strip false makes > sense > >> because we really cannot handle the text. But what would the app then do > >> with the exception? > > I originally thought an exception would be useful, but I changed my > mind as I wrote the patch. Some reasons: > > * What kind of exception? It isn't really an IOException, and the API > doesn't seem keen on adding other kinds. > > * What would the user want to do with it? Re-run the operation in its > exception-free incarnation? > > An exception might be useful for some people, but I think it would be > right to steer those people towards a different API -- maybe not a > part of commons-io. > > Enjoy life, > Adam >
Adam, thanks for sharing your thoughts. This sounds like we're reaching a consensus here. I'd propose the following: - deprecate escapeXml(String) (and no renaming to escapeXmlEntities or the like) - add escapeXml10 and escapeXml11, which escape xml entities and strip invalid characters from the input. What do we do with escapeHtml3 and escapeHtml4? Do we leave them unchanged? Benedikt > > -- > My Phone (mobile): +1 613 986 3339 > My Website: http://adamhooper.com > My Twitter: http://twitter.com/adamhooper > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
