On Fri, Jul 17, 2009 at 11:17 AM, Nathan Beyer <nbe...@gmail.com> wrote:
> On Thu, Jul 16, 2009 at 9:30 PM, Charles Lee<littlee1...@gmail.com> wrote: > > Thanks Nathan! > > > > I will try this :-) > > Where do we define the user's locale and system locale? It seems like > all of this should be located there and associated with that process.> > Sorry Nathan, I do not catch that. Do mean shall we get the user's locale or system locale? > > On Fri, Jul 17, 2009 at 10:05 AM, Nathan Beyer <ndbe...@apache.org> > wrote: > > > >> On Thu, Jul 16, 2009 at 8:50 PM, Nathan Beyer<ndbe...@apache.org> > wrote: > >> > On Thu, Jul 16, 2009 at 8:35 PM, Nathan Beyer<ndbe...@apache.org> > wrote: > >> >> On Thu, Jul 16, 2009 at 8:26 PM, Nathan Beyer<ndbe...@apache.org> > >> wrote: > >> >>> On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee<littlee1...@gmail.com> > >> wrote: > >> >>>> Hi Nathan, > >> >>>> > >> >>>> What I got is 936, the code page identifier. Is there a api for us > to > >> map > >> >>>> 936 to the gb2312? > >> >>> > >> >>> Oh, the 'identifier' bit was missing - yeah, we'll need to translate > >> >>> that into a name of some sort. I'll poke around a bit and see what I > >> >>> can find. > >> >> > >> >> We'll probably just have to put in a mapping ourselves based on the > >> >> documentation. We'd call GetACP [1] and map that to a known alias in > >> >> java.nio.charset that matches the definitions[2] of the identifiers. > >> >> > >> >> [1] http://msdn.microsoft.com/en-us/library/dd318070%28VS.85%29.aspx > >> >> [2] http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx > >> > > >> > This may be better - APR has a function for getting the OS default > >> > encoding. This would work across all platforms that APR supports and I > >> > believe we already use APR. > >> > > >> > > >> > http://apr.apache.org/docs/apr/1.3/group__apr__portabile.html#g6e21845a4a5f3b7dd107b2beea50c91e > >> > >> However, the Windows version of this is simply - return > >> apr_psprintf(pool, "CP%u", (unsigned) GetACP());. Which is essentially > >> "CP" + codePageId. > >> > >> And the Unix version of this method doesn't look very good for our > >> purposes. > >> > > >> > -Nathan > >> >> > >> >>> > >> >>>> If we put 936 in the file.encoding, can we successfully get the > >> encoder and > >> >>>> decoder by charset? > >> >>>> > >> >>>> On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer <ndbe...@apache.org> > >> wrote: > >> >>>> > >> >>>>> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee< > littlee1...@gmail.com> > >> wrote: > >> >>>>> > Hi guys, > >> >>>>> > > >> >>>>> > I have add the locale function in the drlvm, the patch is > attached. > >> >>>>> Please > >> >>>>> > try this new patch on the linux. > >> >>>>> > > >> >>>>> > The patch should work on the linux but fail on the windows. > Because > >> >>>>> windows > >> >>>>> > returns code page not charset from the setlocale. > >> >>>>> > >> >>>>> Code page and character set are the same thing. We shouldn't need > to > >> >>>>> convert it as the Charset APIs will have to support the values > >> anyway. > >> >>>>> > >> >>>>> What's the value you're getting? If it's 'Cp1252', then we're > good, > >> as > >> >>>>> that's just an alias for 'Windows-1252' (or vice-versa). > >> >>>>> > >> >>>>> -Nathan > >> >>>>> > >> >>>>> > >> >>>>> > I hv tried long time to > >> >>>>> > get the charset name from the codepage, for example: > >> >>>>> > CPINFOEX cpInfoEx; > >> >>>>> > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx); > >> >>>>> > if (iReturn > 0) { > >> >>>>> > printf("FULL NAME %s\n", cPinfoEx,CodePageName); > >> >>>>> > } > >> >>>>> > But I only get the full name without any format. > >> >>>>> > > >> >>>>> > There is code page identifiers map in the msdn, detail here. I > may > >> hard > >> >>>>> code > >> >>>>> > this map in the file. But the note on the msdn says: > >> >>>>> > "ANSI code pages can be different on different computers, > or > >> can be > >> >>>>> > changed for a single computer, leading to data corruption. For > the > >> most > >> >>>>> > consistent results, applications should use Unicode, such as > UTF-8 > >> or > >> >>>>> > UTF-16, instead of a specific code page." > >> >>>>> > I am afraid hard-code will fail on some machines. (By the way, > this > >> seems > >> >>>>> > the UTF-8 is suggested to be the default again :-) > >> >>>>> > > >> >>>>> > There is also a class Encoding in the VC++, detail here. But we > can > >> not > >> >>>>> use > >> >>>>> > it here. > >> >>>>> > > >> >>>>> > So anyone knows some thing about locale on the windows? > >> >>>>> > Again, shall use UTF-8 as our default? > >> >>>>> > > >> >>>>> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee < > >> littlee1...@gmail.com> > >> >>>>> wrote: > >> >>>>> >> > >> >>>>> >> That seems we should add it in the drlvm. > >> >>>>> >> > >> >>>>> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.re...@gmail.com> > >> wrote: > >> >>>>> >>> > >> >>>>> >>> Nathan Beyer wrote: > >> >>>>> >>>> > >> >>>>> >>>> Is the IBM VME dealing with this correctly? Do we just need > to > >> fix > >> >>>>> >>>> DRLVM? > >> >>>>> >>> > >> >>>>> >>> Yes, I only tested on Linux, IBM VME set the property > correctly. > >> >>>>> >>> > >> >>>>> >>>> > >> >>>>> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.re...@gmail.com> > >> wrote: > >> >>>>> >>>>> > >> >>>>> >>>>> Kevin Zhou wrote: > >> >>>>> >>>>>> > >> >>>>> >>>>>> Yea, from luniglob.c, CL attempts to read the > "file.encoding" > >> >>>>> property > >> >>>>> >>>>>> adown > >> >>>>> >>>>>> VM but fails to get the correct encoding. > >> >>>>> >>>>>> > >> >>>>> >>>>>> Regis, do you know any other specific ways that CL can gain > >> the > >> >>>>> right > >> >>>>> >>>>>> property? > >> >>>>> >>>>> > >> >>>>> >>>>> We can get from OS directly. Maybe just read env variables > on > >> Linux? > >> >>>>> >>>>> > >> >>>>> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.re...@gmail.com> > >> wrote: > >> >>>>> >>>>>> > >> >>>>> >>>>>>> Charles Lee wrote: > >> >>>>> >>>>>>> > >> >>>>> >>>>>>>> Hi Nanthan, > >> >>>>> >>>>>>>> > >> >>>>> >>>>>>>> If the file encoding derive from the OS, it should be the > >> some > >> >>>>> bugs > >> >>>>> >>>>>>>> in > >> >>>>> >>>>>>>> it > >> >>>>> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. > Our > >> default > >> >>>>> >>>>>>>> codec > >> >>>>> >>>>>>>> is > >> >>>>> >>>>>>>> still ISO8859-1. Do you know where can we found such > codes? > >> >>>>> >>>>>>>> > >> >>>>> >>>>>>> Classlib expected vm do this and set the property, but it > >> didn't, > >> >>>>> so > >> >>>>> >>>>>>> we > >> >>>>> >>>>>>> have to do this by ourselves. > >> >>>>> >>>>>>> > >> >>>>> >>>>>>> > >> >>>>> >>>>>>> > >> >>>>> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer < > >> nbe...@gmail.com> > >> >>>>> >>>>>>>> wrote: > >> >>>>> >>>>>>>> > >> >>>>> >>>>>>>> Are we talking about windows or linux?the default file > >> encoding > >> >>>>> >>>>>>>> should > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>> derive from the OS. I believe that's defined by the > specs. > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>> Sent from my iPhone > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee < > >> littlee1...@gmail.com> > >> >>>>> >>>>>>>>> wrote: > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>> On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv > >> >>>>> >>>>>>>>> <firep...@gmail.com> > >> >>>>> >>>>>>>>> > >> >>>>> >>>>>>>>>> wrote: > >> >>>>> >>>>>>>>>> > >> >>>>> >>>>>>>>>> Hi, > >> >>>>> >>>>>>>>>> > >> >>>>> >>>>>>>>>>> Charles, I believe UTF-8 is the default encoding for > RI, > >> and > >> >>>>> it > >> >>>>> >>>>>>>>>>> sounds > >> >>>>> >>>>>>>>>>> reasonable. > >> >>>>> >>>>>>>>>>> BTW, it may encounter some compatibility problem, > maybe > >> we > >> >>>>> need > >> >>>>> >>>>>>>>>>> to > >> >>>>> >>>>>>>>>>> run > >> >>>>> >>>>>>>>>>> more tests to verify? > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> 2009/7/14 Charles Lee <littlee1...@gmail.com> > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> Hi guys: > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> I am doing some test cases on the ant junit test case > >> and > >> >>>>> >>>>>>>>>>>> meeting > >> >>>>> >>>>>>>>>>>> some > >> >>>>> >>>>>>>>>>>> encoding problems. I find they are maybe caused by > the > >> >>>>> different > >> >>>>> >>>>>>>>>>>> default > >> >>>>> >>>>>>>>>>>> encoding from RI and harmony. My local is > en_US.UTF-8, > >> RI > >> >>>>> >>>>>>>>>>>> default is > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> UTF-8 > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> but harmony is 8859-1. And then I have encountered > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> HARMONY-3736< > >> >>>>> https://issues.apache.org/jira/browse/HARMONY-3736>, > >> >>>>> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we > >> always > >> >>>>> get > >> >>>>> >>>>>>>>>>>> 8859-1. > >> >>>>> >>>>>>>>>>>> Because: (correct me if wrong :-) > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always > get > >> null > >> >>>>> if > >> >>>>> >>>>>>>>>>>> we > >> >>>>> >>>>>>>>>>>> call > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> vm > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> method > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got > >> null > >> >>>>> from > >> >>>>> >>>>>>>>>>>> vm, > >> >>>>> >>>>>>>>>>>> we > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> set > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> Sorry, it should be luniglob.c > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>> 8859-1. > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> 3. we can not set file.encode on the run time. > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the > >> non-ascii > >> >>>>> >>>>>>>>>>>> character. > >> >>>>> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default? > >> >>>>> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1 > , > >> it says > >> >>>>> >>>>>>>>>>>> "In > >> >>>>> >>>>>>>>>>>> computing > >> >>>>> >>>>>>>>>>>> applications, encodings that provide full UCS support > >> (such as > >> >>>>> >>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and > >> >>>>> >>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>) are > >> finding > >> >>>>> >>>>>>>>>>>> increasing > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> favor > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> over encodings based on ISO 8859-1." Should we simply > >> change > >> >>>>> >>>>>>>>>>> iso8859-1 > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> to > >> >>>>> >>>>>>>>>>>> utf-8? > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> -- > >> >>>>> >>>>>>>>>>>> Yours sincerely, > >> >>>>> >>>>>>>>>>>> Charles Lee > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>>> > >> >>>>> >>>>>>>>>>> -- > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> Best Regards! > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> Jimmy, Jing Lv > >> >>>>> >>>>>>>>>>> China Software Development Lab, IBM > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>>> > >> >>>>> >>>>>>>>>> -- > >> >>>>> >>>>>>>>>> Yours sincerely, > >> >>>>> >>>>>>>>>> Charles Lee > >> >>>>> >>>>>>>>>> > >> >>>>> >>>>>>>>>> > >> >>>>> >>>>>>> -- > >> >>>>> >>>>>>> Best Regards, > >> >>>>> >>>>>>> Regis. > >> >>>>> >>>>>>> > >> >>>>> >>>>> > >> >>>>> >>>>> -- > >> >>>>> >>>>> Best Regards, > >> >>>>> >>>>> Regis. > >> >>>>> >>>>> > >> >>>>> >>>> > >> >>>>> >>> > >> >>>>> >>> > >> >>>>> >>> -- > >> >>>>> >>> Best Regards, > >> >>>>> >>> Regis. > >> >>>>> >> > >> >>>>> >> > >> >>>>> >> > >> >>>>> >> -- > >> >>>>> >> Yours sincerely, > >> >>>>> >> Charles Lee > >> >>>>> >> > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > -- > >> >>>>> > Yours sincerely, > >> >>>>> > Charles Lee > >> >>>>> > > >> >>>>> > > >> >>>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Yours sincerely, > >> >>>> Charles Lee > >> >>>> > >> >>> > >> >> > >> > > >> > > > > > > > > -- > > Yours sincerely, > > Charles Lee > > > -- Yours sincerely, Charles Lee