Re: Shall we change our file.encoding

Nathan Beyer Thu, 16 Jul 2009 18:26:56 -0700

On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee<littlee1...@gmail.com> wrote:
> Hi Nathan,
>
> What I got is 936, the code page identifier. Is there a api for us to map
> 936 to the gb2312?


Oh, the 'identifier' bit was missing - yeah, we'll need to translate
that into a name of some sort. I'll poke around a bit and see what I
can find.

> If we put 936 in the file.encoding, can we successfully get the encoder and
> decoder by charset?
>
> On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer <ndbe...@apache.org> wrote:
>
>> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee<littlee1...@gmail.com> wrote:
>> > Hi guys,
>> >
>> > I have add the locale function in the drlvm, the patch is attached.
>> Please
>> > try this new patch on the linux.
>> >
>> > The patch should work on the linux but fail on the windows. Because
>> windows
>> > returns code page not charset from the setlocale.
>>
>> Code page and character set are the same thing. We shouldn't need to
>> convert it as the Charset APIs will have to support the values anyway.
>>
>> What's the value you're getting? If it's 'Cp1252', then we're good, as
>> that's just an alias for 'Windows-1252' (or vice-versa).
>>
>> -Nathan
>>
>>
>> > I hv tried long time to
>> > get the charset name from the codepage, for example:
>> > CPINFOEX cpInfoEx;
>> > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx);
>> > if (iReturn > 0) {
>> >     printf("FULL NAME %s\n", cPinfoEx,CodePageName);
>> > }
>> > But I only get the full name without any format.
>> >
>> > There is code page identifiers map in the msdn, detail here. I may hard
>> code
>> > this map in the file. But the note on the msdn says:
>> >      "ANSI code pages can be different on different computers, or can be
>> > changed for a single computer, leading to data corruption. For the most
>> > consistent results, applications should use Unicode, such as UTF-8 or
>> > UTF-16, instead of a specific code page."
>> > I am afraid hard-code will fail on some machines. (By the way, this seems
>> > the UTF-8 is suggested to be the default again :-)
>> >
>> > There is also a class Encoding in the VC++, detail here. But we can not
>> use
>> > it here.
>> >
>> > So anyone knows some thing about locale on the windows?
>> > Again, shall use UTF-8 as our default?
>> >
>> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee <littlee1...@gmail.com>
>> wrote:
>> >>
>> >> That seems we should add it in the drlvm.
>> >>
>> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.re...@gmail.com> wrote:
>> >>>
>> >>> Nathan Beyer wrote:
>> >>>>
>> >>>> Is the IBM VME dealing with this correctly? Do we just need to fix
>> >>>> DRLVM?
>> >>>
>> >>> Yes, I only tested on Linux, IBM VME set the property correctly.
>> >>>
>> >>>>
>> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.re...@gmail.com> wrote:
>> >>>>>
>> >>>>> Kevin Zhou wrote:
>> >>>>>>
>> >>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding"
>> property
>> >>>>>> adown
>> >>>>>> VM but fails to get the correct encoding.
>> >>>>>>
>> >>>>>> Regis, do you know any other specific ways that CL can gain the
>> right
>> >>>>>> property?
>> >>>>>
>> >>>>> We can get from OS directly. Maybe just read env variables on Linux?
>> >>>>>
>> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.re...@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Charles Lee wrote:
>> >>>>>>>
>> >>>>>>>> Hi Nanthan,
>> >>>>>>>>
>> >>>>>>>> If the file encoding derive from the OS, it should be the some
>> bugs
>> >>>>>>>> in
>> >>>>>>>> it
>> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our default
>> >>>>>>>> codec
>> >>>>>>>> is
>> >>>>>>>> still ISO8859-1. Do you know where can we found such codes?
>> >>>>>>>>
>> >>>>>>> Classlib expected vm do this and set the property, but it didn't,
>> so
>> >>>>>>> we
>> >>>>>>> have to do this by ourselves.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer <nbe...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>  Are we talking about windows or linux?the default file encoding
>> >>>>>>>> should
>> >>>>>>>>>
>> >>>>>>>>> derive from the OS. I believe that's defined by the specs.
>> >>>>>>>>>
>> >>>>>>>>> Sent from my iPhone
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee <littlee1...@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>  On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv
>> >>>>>>>>> <firep...@gmail.com>
>> >>>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>  Hi,
>> >>>>>>>>>>
>> >>>>>>>>>>>  Charles, I believe UTF-8 is the default encoding for RI, and
>> it
>> >>>>>>>>>>> sounds
>> >>>>>>>>>>> reasonable.
>> >>>>>>>>>>>  BTW, it may encounter some compatibility problem, maybe we
>> need
>> >>>>>>>>>>> to
>> >>>>>>>>>>> run
>> >>>>>>>>>>> more tests to verify?
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2009/7/14 Charles Lee <littlee1...@gmail.com>
>> >>>>>>>>>>>
>> >>>>>>>>>>>  Hi guys:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I am doing some test cases on the ant junit test case and
>> >>>>>>>>>>>> meeting
>> >>>>>>>>>>>> some
>> >>>>>>>>>>>> encoding problems. I find they are maybe caused by the
>> different
>> >>>>>>>>>>>> default
>> >>>>>>>>>>>> encoding from RI and harmony. My local is en_US.UTF-8, RI
>> >>>>>>>>>>>> default is
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>  UTF-8
>> >>>>>>>>>>>
>> >>>>>>>>>>>  but harmony is 8859-1. And then I have encountered
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> HARMONY-3736<
>> https://issues.apache.org/jira/browse/HARMONY-3736>,
>> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we always
>> get
>> >>>>>>>>>>>> 8859-1.
>> >>>>>>>>>>>> Because: (correct me if wrong :-)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always get null
>> if
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>> call
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>  vm
>> >>>>>>>>>>>
>> >>>>>>>>>>>  method
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got null
>> from
>> >>>>>>>>>>>> vm,
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>  set
>> >>>>>>>>>>>
>> >>>>>>>>>>>  Sorry, it should be luniglob.c
>> >>>>>>>>>>>
>> >>>>>>>>>>  8859-1.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 3. we can not set file.encode on the run time.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the non-ascii
>> >>>>>>>>>>>> character.
>> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default?
>> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1, it says
>> >>>>>>>>>>>> "In
>> >>>>>>>>>>>> computing
>> >>>>>>>>>>>> applications, encodings that provide full UCS support (such as
>> >>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and
>> >>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>) are finding
>> >>>>>>>>>>>> increasing
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>  favor
>> >>>>>>>>>>>
>> >>>>>>>>>>>  over encodings based on ISO 8859-1." Should we simply change
>> >>>>>>>>>>> iso8859-1
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>> utf-8?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Yours sincerely,
>> >>>>>>>>>>>> Charles Lee
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best Regards!
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jimmy, Jing Lv
>> >>>>>>>>>>> China Software Development Lab, IBM
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Yours sincerely,
>> >>>>>>>>>> Charles Lee
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>> --
>> >>>>>>> Best Regards,
>> >>>>>>> Regis.
>> >>>>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best Regards,
>> >>>>> Regis.
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards,
>> >>> Regis.
>> >>
>> >>
>> >>
>> >> --
>> >> Yours sincerely,
>> >> Charles Lee
>> >>
>> >
>> >
>> >
>> > --
>> > Yours sincerely,
>> > Charles Lee
>> >
>> >
>>
>
>
>
> --
> Yours sincerely,
> Charles Lee
>

Re: Shall we change our file.encoding

Reply via email to