On Feb 12, 2015, at 5:27 PM, Brian Burkhalter <brian.burkhal...@oracle.com> wrote:
> > On Feb 12, 2015, at 8:18 AM, Paul Sandoz <paul.san...@oracle.com> wrote: > >>> This is a morass and I hope that someone more apt to know it well would >>> comment. The U+0000 null control character is always illegal though I do >>> know that. >> >> Yes. IIRC XML 1.1 basically allows any character except U+0000. > > "Note that the code point U+0000, assigned to the null control character, is > the only character encoded in Unicode and ISO/IEC 10646 that is always > invalid in any XML 1.0 and 1.1 document." > > http://en.wikipedia.org/wiki/Valid_characters_in_XML#Characters_allowed_but_discouraged > >>> >>> The problem is that on OSX and Windows prefs are not stored to XML >> >> What are they stored in? name/value pairs? > > Yes. On OSX I think Property List (.plist) files; on Windows I do not know > yet. > >>> whereas on Unix they are. >> >> Is that a specification requirement? > > No, I think it’s an artifact of no inherent DB-like facility in the system. > >>> That would make it an error to add such a value to the prefs on some >>> platforms but not on others. >> >> Yes, for an interoperable format potentially read by other tools having >> U+0000 is a really bad idea. > > Yep. > And i think that applies to plist files too. >> My inclination is if properties are written out to a text file then it >> should fail if a key/value contains U+0000 (Binary data should be base64 >> encoded in such cases.) Replacing just subtlety hides or defers the issue. > > That was my original idea \in fact (webrev.00, unpublished). It would > however require a spec update to > > http://docs.oracle.com/javase/8/docs/api/java/util/prefs/Preferences.html#put-java.lang.String-java.lang.String- > > to allow for an IAE in this case. This would also be a backward-incompatible > change for platforms which do allow storing such values. > > Note that a similar situation applies to Properties. > My recommendation is serialization of properties to any textual format should barf if a U+0000 is encountered. Otherwise it's just hiding bugs. In such cases i think there is a strong justification to introduce such an incompatible change. I except it is rare to encounter in practice. Paul.