I am not concerned so much as to which list the conversation is recorded on as long as it's on a project mailing list. You all can do what you want of course but keep in mind that excluding a mailing list removes the opportunity for people to observe and learn. It's just simpler IMO to keep it on a list. You never know who might chime in with an interesting tid bit or solution, that day, a week or month later.
My 2c, Gary On Tue, Jan 9, 2024, 9:16 PM Joseph Kesselman <kesh...@alum.mit.edu> wrote: > Gary, at some level of detail it ought to at least cross over into > xalan-dev. > > The user list shouldn't usually get into the implementation weeds, though > discussion of the correct and/or desired behavior is still appropriate here. > > Personally I would say some private brainstorming is harmless at worst, as > long as conclusions get reported back to the team. It's impossible to > prevent, and it _can_ be a good thing in letting people explore something > before making a more formal proposal. Both formal and informal have their > uses. I mean, cone on, does your team thrash out every line if code in > meetings, or do you go off and work with others to come up with > proposal/prototype first? > > Let water-cooler chats happen, and just count on folks reporting back. It > works. > > -- > /_ Joe Kesselman (he/him/his) > -/ _) My Alexa skill for New Music/New Sounds fans: > / https://www.amazon.com/dp/B09WJ3H657/ > > Caveat: Opinionated old geezer with overcompensated writer's block. May > be redundant, verbose, prolix, sesquipedalian, didactic, officious, or > redundant. > ------------------------------ > *From:* Gary Gregory <garydgreg...@gmail.com> > *Sent:* Tuesday, January 9, 2024 8:14:28 PM > *To:* Eric J. Schwarzenbach <eric.schwarzenb...@wrycan.com> > *Cc:* Joseph Kesselman <kesh...@alum.mit.edu>; j-users@xalan.apache.org < > j-users@xalan.apache.org> > *Subject:* Re: supplementary characters emojis, etc turned ino surrogate > pairs > > There is no need for private communications IMO unless it's a > security issue in which case, we have a private security mailing list. > > Gary > > On Tue, Jan 9, 2024, 5:36 PM Eric J. Schwarzenbach < > eric.schwarzenb...@wrycan.com> wrote: > > I've managed to make Xalan do something more correct for my test case by > merging some bits from the JDK 21 version of ToStream into Xalan 2.7.3's > version. > > Note that with the jdk code, what it does is replace either 💻 or > the literal character it represents with the equivalent decimal entity, > 💻 > > Joe, I'm sending you an email directly about this since I think it's > beyond the scope of xalan-user. > > > On 1/9/24 13:34, Joseph Kesselman wrote: > > No problem. I was around when we still had to teach people the > distinction, and error messages still sometimes get it wrong. > > I'll try to look into it this week, unless someone beats me to it. > > -- > /_ Joe Kesselman (he/him/his) > -/ _) My Alexa skill for New Music/New Sounds fans: > / https://www.amazon.com/dp/B09WJ3H657/ > > Caveat: Opinionated old geezer with overcompensated writer's block. May > be redundant, verbose, prolix, sesquipedalian, didactic, officious, or > redundant. > ------------------------------ > *From:* Eric J. Schwarzenbach <eric.schwarzenb...@wrycan.com> > <eric.schwarzenb...@wrycan.com> > *Sent:* Tuesday, January 9, 2024 12:39:07 PM > *To:* j-users@xalan.apache.org <j-users@xalan.apache.org> > <j-users@xalan.apache.org> > *Subject:* Re: supplementary characters emojis, etc turned ino surrogate > pairs > > > Apologies for the mistaken terminology. I do usually know the different > between valid and well-formed and am usually careful about the distinction, > however the idea that a numeric character entity could break either was new > to me, and really doesn't really fit my notion of either. I repeated the > characterization of the problem that I had read without checking into it. > Sorry for that, and thanks for looking into it. > > Cheers, > > Eric > On 1/8/24 18:58, Joseph Kesselman wrote: > > Please be careful to distinguish "Well-Formed" from "Valid". If an XML > tool complains that a document is not valid that means it doesn't match the > DTD or schema that describes its expected structure, nor that it isn't > correct XML. It's better to avoid using the term valid unless you mean > Valid in the sense XML does. > > A high-numeric-value surrogate pair shouldn't be a well-formedness issue, > barring a bug. > > I haven't looked at this in any detail in a few decades, but I'll try to > at least sanity-check now that I'm emerging from the build caverns again. > > > For convenience of others who might want it: : Legal characters in XML 1.0 > are defined at https://www.w3.org/TR/xml/#charsets. I believe XML 1.1 > added the null character, originally not accepted. There are some unicode > ranges excluded, but those were supposed to be permanently reserved blocks. > > Xalan did originally have some issues with characters above the UTF-16 > range, mostly having to do with counts and offsets since the first draft > just used Java characters. But I thought we had addresses those. Obviously > if the fork shipping in javax has solved it. A solution is possible and > probably already in our backlog... > > > > -- > /_ Joe Kesselman (he/him/his) > -/ _) My Alexa skill for New Music/New Sounds fans: > / https://www.amazon.com/dp/B09WJ3H657/ > > Caveat: Opinionated old geezer with overcompensated writer's block. May > be redundant, verbose, prolix, sesquipedalian, didactic, officious, or > redundant. > ------------------------------ > *From:* Stanimir Stamenkov via j-users <j-users@xalan.apache.org> > <j-users@xalan.apache.org> > *Sent:* Monday, January 8, 2024 2:51:57 PM > *To:* j-users@xalan.apache.org <j-users@xalan.apache.org> > <j-users@xalan.apache.org> > *Subject:* Re: supplementary characters emojis, etc turned ino surrogate > pairs > > Mon, 8 Jan 2024 16:33:39 +0100, /Martin Honnen/: > > On 08/01/2024 16:28, Eric J. Schwarzenbach wrote: > > > >> Does anybody have a patch for > >> > >> https://issues.apache.org/jira/browse/XALANJ-2560 > >> > >> That Xalan produces invalid XML with some utf-8 characters seems > >> rather serious. I find putting 💻 or the literal character it > >> represents into an XML document and running it through any XML-to-XML > >> transform results in it being replaced with �� in the > >> output which evidently makes the XML invalid. I tried a change to > >> ToStream.java from https://issues.apache.org/jira/browse/XALANJ-2419 > >> with the source of Xalan 2.7.3 but it did not help. > > > > Use Saxon, perhaps, or see whether > > https://stackoverflow.com/a/74245232/252228 helps for patching Xalan. > > One may also use just the JDK-supplied provider (a Xalan fork): > > * https://lists.apache.org/thread/3hzpj1gt1ql38d17dcfxrgss872v50l6 "XML > Entities" > > Related to the patch referenced in the Stack Overflow answer, one may > compare with the JDK sources as well: > > * > > https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java > * > > https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToHTMLStream.java > > -- > Stanimir > >