I am not concerned so much as to which list the conversation is recorded on
as long as it's on a project mailing list. You all can do what you want of
course but keep in mind that excluding a mailing list removes the
opportunity for people to observe and learn. It's just simpler IMO to keep
it on a list. You never know who might chime in with an interesting tid bit
or solution, that day, a week or month later.

My 2c,
Gary

On Tue, Jan 9, 2024, 9:16 PM Joseph Kesselman <kesh...@alum.mit.edu> wrote:

> Gary, at some level of detail it ought to at least cross over into
> xalan-dev.
>
> The user list shouldn't usually get into the implementation weeds, though
> discussion of the correct and/or desired behavior is still appropriate here.
>
> Personally I would say some private brainstorming is harmless at worst, as
> long as conclusions get reported back to the team. It's impossible to
> prevent, and it _can_ be a good thing in letting people explore something
> before making a more formal proposal. Both formal and informal have their
> uses. I mean, cone on, does your team thrash out every line if code in
> meetings, or do you go off and work with others to come up with
> proposal/prototype first?
>
> Let water-cooler chats happen, and just count on folks reporting back. It
> works.
>
> --
>    /_  Joe Kesselman (he/him/his)
> -/ _) My Alexa skill for New Music/New Sounds fans:
>    /   https://www.amazon.com/dp/B09WJ3H657/
>
> Caveat: Opinionated old geezer with overcompensated writer's block. May
> be redundant, verbose, prolix, sesquipedalian, didactic, officious, or
> redundant.
> ------------------------------
> *From:* Gary Gregory <garydgreg...@gmail.com>
> *Sent:* Tuesday, January 9, 2024 8:14:28 PM
> *To:* Eric J. Schwarzenbach <eric.schwarzenb...@wrycan.com>
> *Cc:* Joseph Kesselman <kesh...@alum.mit.edu>; j-users@xalan.apache.org <
> j-users@xalan.apache.org>
> *Subject:* Re: supplementary characters emojis, etc turned ino surrogate
> pairs
>
> There is no need for private communications IMO unless it's a
> security issue in which case, we have a private security mailing list.
>
> Gary
>
> On Tue, Jan 9, 2024, 5:36 PM Eric J. Schwarzenbach <
> eric.schwarzenb...@wrycan.com> wrote:
>
> I've managed to make Xalan do something more correct for my test case by
> merging some bits from the JDK 21 version of ToStream into Xalan 2.7.3's
> version.
>
> Note that with the jdk code, what it does is replace either &#x1F4BB; or
> the literal character it represents with the equivalent decimal entity,
> &#128187;
>
> Joe, I'm sending you an email directly about this since I think it's
> beyond the scope of xalan-user.
>
>
> On 1/9/24 13:34, Joseph Kesselman wrote:
>
> No problem. I was around when we still had to teach people the
> distinction, and error messages still sometimes get it wrong.
>
> I'll try to look into it this week, unless someone beats me to it.
>
> --
>    /_  Joe Kesselman (he/him/his)
> -/ _) My Alexa skill for New Music/New Sounds fans:
>    /   https://www.amazon.com/dp/B09WJ3H657/
>
> Caveat: Opinionated old geezer with overcompensated writer's block. May
> be redundant, verbose, prolix, sesquipedalian, didactic, officious, or
> redundant.
> ------------------------------
> *From:* Eric J. Schwarzenbach <eric.schwarzenb...@wrycan.com>
> <eric.schwarzenb...@wrycan.com>
> *Sent:* Tuesday, January 9, 2024 12:39:07 PM
> *To:* j-users@xalan.apache.org <j-users@xalan.apache.org>
> <j-users@xalan.apache.org>
> *Subject:* Re: supplementary characters emojis, etc turned ino surrogate
> pairs
>
>
> Apologies for the mistaken terminology. I do usually know the different
> between valid and well-formed and am usually careful about the distinction,
> however the idea that a numeric character entity could break either was new
> to me, and really doesn't really fit my notion of either. I repeated the
> characterization of the problem that I had read without checking into it.
> Sorry for that, and thanks for looking into it.
>
> Cheers,
>
> Eric
> On 1/8/24 18:58, Joseph Kesselman wrote:
>
> Please be careful to distinguish "Well-Formed"  from "Valid". If an XML
> tool complains that a document is not valid that means it doesn't match the
> DTD or schema that describes its expected structure, nor that it isn't
> correct XML. It's better to avoid using the term valid unless you mean
> Valid in the sense XML does.
>
> A high-numeric-value surrogate pair shouldn't be a well-formedness issue,
> barring a bug.
>
> I haven't looked at this in any detail in a few decades, but I'll try to
> at least sanity-check now that I'm emerging from the build caverns again.
>
>
> For convenience of others who might want it: : Legal characters in XML 1.0
> are defined at https://www.w3.org/TR/xml/#charsets. I believe XML 1.1
> added the null character, originally not accepted. There are some unicode
> ranges excluded, but those were supposed to be permanently reserved blocks.
>
> Xalan did originally have some issues with characters above the UTF-16
> range, mostly having to do with counts and offsets since the first draft
> just used Java characters. But I thought we had addresses those. Obviously
> if the fork shipping in javax has solved it. A solution is possible and
> probably already in our backlog...
>
>
>
> --
>    /_  Joe Kesselman (he/him/his)
> -/ _) My Alexa skill for New Music/New Sounds fans:
>    /   https://www.amazon.com/dp/B09WJ3H657/
>
> Caveat: Opinionated old geezer with overcompensated writer's block. May
> be redundant, verbose, prolix, sesquipedalian, didactic, officious, or
> redundant.
> ------------------------------
> *From:* Stanimir Stamenkov via j-users <j-users@xalan.apache.org>
> <j-users@xalan.apache.org>
> *Sent:* Monday, January 8, 2024 2:51:57 PM
> *To:* j-users@xalan.apache.org <j-users@xalan.apache.org>
> <j-users@xalan.apache.org>
> *Subject:* Re: supplementary characters emojis, etc turned ino surrogate
> pairs
>
> Mon, 8 Jan 2024 16:33:39 +0100, /Martin Honnen/:
> > On 08/01/2024 16:28, Eric J. Schwarzenbach wrote:
> >
> >> Does anybody have a patch for
> >>
> >> https://issues.apache.org/jira/browse/XALANJ-2560
> >>
> >> That Xalan produces invalid XML with some utf-8 characters seems
> >> rather serious. I find putting &#x1F4BB; or the literal character it
> >> represents into an XML document and running it through any XML-to-XML
> >> transform results in it being replaced with &#55357;&#56507; in the
> >> output which evidently makes the XML invalid. I tried a change to
> >> ToStream.java from https://issues.apache.org/jira/browse/XALANJ-2419
> >> with the source of Xalan 2.7.3 but it did not help.
> >
> > Use Saxon, perhaps, or see whether
> > https://stackoverflow.com/a/74245232/252228 helps for patching Xalan.
>
> One may also use just the JDK-supplied provider (a Xalan fork):
>
> * https://lists.apache.org/thread/3hzpj1gt1ql38d17dcfxrgss872v50l6 "XML
> Entities"
>
> Related to the patch referenced in the Stack Overflow answer, one may
> compare with the JDK sources as well:
>
> *
>
> https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java
> *
>
> https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToHTMLStream.java
>
> --
> Stanimir
>
>

Reply via email to