Re: Adding new IBM extended charsets

Xueming Shen Thu, 19 Jul 2018 00:28:33 -0700

Hi Nasser,

From openjdk's perspective It would be preferred to direct the developto use the charsetimplementation provided by IBM, or the reliable third party that has theappropriate knowledge,experience and resource to support/maintain those charsets such as theicu4j charsetproject. I have been pulling the data from that huge icu-charset-datafile and implement/maintainthem based on my best knowledge, but I'm sure engineers from IBM or theicu project probablycan do a much better job to implement/maintain/update those charsetsgoing forward.

As first step we can separate those IBM charsets from the jdk.charsetinto a separate packagesomewhere and configure them to be built into java.base andjdk.charsets, for aix platform only.Then we can further discuss the best way to handle/distribute thosecharsets that are not neededfor the java.base module (for vm startup). As I said, it would be idealif we can remove them from theopenjdk repo/binaries complete and direct the developer/user to use theicu4j charset providerfor those encodings, when needed. But given the possible compatibilityconcern, we might want to

phase this work out gradually in next major release.

Thanks,
Sherman


On 7/17/18, 6:48 AM, Nasser Ebrahim wrote:

Hi Alan,
Thank you for your inputs. I would like to clarify that all the IBMcharsets (IBMXXXX) in jdk.charsets are not IBM platform specificcharsets. For example, only 43 charsets out of 72 IBMXXXX injdk.charsets are EBCDIC or IBM platform specific charsets. Similarly,many charsets in the list of 75 charsets which we would like tocontribute are not EBCDIC charsets.
I feel we should have a standard guideline for the extended charsets.If we are keeping the extended charsets in the JDK, then we may wantto consider all ICU/IANA approved charsets in JDK. Otherwise, we maywant to keep only the standard charsets in JDK and remove all theextended charsets so that all extended charsets can be taken fromthird party libraries like ICU4J.
If we decided to keep the extended charsets, then may be we canclassify the extended charsets as ASCII and EBCDIC and thecorresponding modules as jdk.ascii.charset and jdk.ebcdic.charset.Then, depends upon the platform, we can consider including either ofthe charset module or both.
Please advise.

Thank you,
Nasser Ebrahim




From: Alan Bateman <[email protected]>
To: Nasser Ebrahim <[email protected]>, Xueming Shen<[email protected]>, [email protected]
Date: 07/09/2018 01:25 AM
Subject: Re: Adding new IBM extended charsets
------------------------------------------------------------------------



On 06/07/2018 14:56, Nasser Ebrahim wrote:
> :
> I understood you preferred option is 3 [Remove all extended charsetsfrom> JDK (keep only default charsets) and use the extended charsets fromthird> party like ICU4J]. Just to confirm, so you meant we need to keeponly the> standard charsets in the JDK and remove all the extended charsetsfrom JDK> and use them from ICU4J OR you meant apply that only for the newextended
> charsets. I think it is better to keep the consistency - either take all
> extended charsets from ICU4J or maintain all extended charsets with JDK.
> Keeping some extended charsets within JDK and use ICU4J for otherextended
> charsets may confuse the Java user.
I think the suggestion in Sherman's mail is to drop the 70 or so IBM
charsets from jdk.charsets. This will reduce the size of jdk.charsets
and eliminate the need to maintain these charsets (at least on non-AIX
builds). If developers need these charsets, say when connecting to
database on an IBM system, then they can deploy the ICU4J provider on
the class path or module path.

I don't think the suggestion impacts the 11 IBM charsets in java.base on
non-AIX builds or the non-IBM charsets in jdk.charsets. They may be
opportunities to drop some of these but that can be looked at separately.

Also I don't think the suggestion impacts the additional 12 IBM charsets
that are included in the AIX build of java.base at this time. From the
review threads, it seems there are supported locales on AIX that map to
these charsets so this is why they are in java.base.

-Alan.

Re: Adding new IBM extended charsets

Reply via email to