On Thu, Jul 16, 2009 at 9:30 PM, Charles Lee<littlee1...@gmail.com> wrote: > Thanks Nathan! > > I will try this :-)
Where do we define the user's locale and system locale? It seems like all of this should be located there and associated with that process. > > On Fri, Jul 17, 2009 at 10:05 AM, Nathan Beyer <ndbe...@apache.org> wrote: > >> On Thu, Jul 16, 2009 at 8:50 PM, Nathan Beyer<ndbe...@apache.org> wrote: >> > On Thu, Jul 16, 2009 at 8:35 PM, Nathan Beyer<ndbe...@apache.org> wrote: >> >> On Thu, Jul 16, 2009 at 8:26 PM, Nathan Beyer<ndbe...@apache.org> >> wrote: >> >>> On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee<littlee1...@gmail.com> >> wrote: >> >>>> Hi Nathan, >> >>>> >> >>>> What I got is 936, the code page identifier. Is there a api for us to >> map >> >>>> 936 to the gb2312? >> >>> >> >>> Oh, the 'identifier' bit was missing - yeah, we'll need to translate >> >>> that into a name of some sort. I'll poke around a bit and see what I >> >>> can find. >> >> >> >> We'll probably just have to put in a mapping ourselves based on the >> >> documentation. We'd call GetACP [1] and map that to a known alias in >> >> java.nio.charset that matches the definitions[2] of the identifiers. >> >> >> >> [1] http://msdn.microsoft.com/en-us/library/dd318070%28VS.85%29.aspx >> >> [2] http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx >> > >> > This may be better - APR has a function for getting the OS default >> > encoding. This would work across all platforms that APR supports and I >> > believe we already use APR. >> > >> > >> http://apr.apache.org/docs/apr/1.3/group__apr__portabile.html#g6e21845a4a5f3b7dd107b2beea50c91e >> >> However, the Windows version of this is simply - return >> apr_psprintf(pool, "CP%u", (unsigned) GetACP());. Which is essentially >> "CP" + codePageId. >> >> And the Unix version of this method doesn't look very good for our >> purposes. >> > >> > -Nathan >> >> >> >>> >> >>>> If we put 936 in the file.encoding, can we successfully get the >> encoder and >> >>>> decoder by charset? >> >>>> >> >>>> On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer <ndbe...@apache.org> >> wrote: >> >>>> >> >>>>> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee<littlee1...@gmail.com> >> wrote: >> >>>>> > Hi guys, >> >>>>> > >> >>>>> > I have add the locale function in the drlvm, the patch is attached. >> >>>>> Please >> >>>>> > try this new patch on the linux. >> >>>>> > >> >>>>> > The patch should work on the linux but fail on the windows. Because >> >>>>> windows >> >>>>> > returns code page not charset from the setlocale. >> >>>>> >> >>>>> Code page and character set are the same thing. We shouldn't need to >> >>>>> convert it as the Charset APIs will have to support the values >> anyway. >> >>>>> >> >>>>> What's the value you're getting? If it's 'Cp1252', then we're good, >> as >> >>>>> that's just an alias for 'Windows-1252' (or vice-versa). >> >>>>> >> >>>>> -Nathan >> >>>>> >> >>>>> >> >>>>> > I hv tried long time to >> >>>>> > get the charset name from the codepage, for example: >> >>>>> > CPINFOEX cpInfoEx; >> >>>>> > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx); >> >>>>> > if (iReturn > 0) { >> >>>>> > printf("FULL NAME %s\n", cPinfoEx,CodePageName); >> >>>>> > } >> >>>>> > But I only get the full name without any format. >> >>>>> > >> >>>>> > There is code page identifiers map in the msdn, detail here. I may >> hard >> >>>>> code >> >>>>> > this map in the file. But the note on the msdn says: >> >>>>> > "ANSI code pages can be different on different computers, or >> can be >> >>>>> > changed for a single computer, leading to data corruption. For the >> most >> >>>>> > consistent results, applications should use Unicode, such as UTF-8 >> or >> >>>>> > UTF-16, instead of a specific code page." >> >>>>> > I am afraid hard-code will fail on some machines. (By the way, this >> seems >> >>>>> > the UTF-8 is suggested to be the default again :-) >> >>>>> > >> >>>>> > There is also a class Encoding in the VC++, detail here. But we can >> not >> >>>>> use >> >>>>> > it here. >> >>>>> > >> >>>>> > So anyone knows some thing about locale on the windows? >> >>>>> > Again, shall use UTF-8 as our default? >> >>>>> > >> >>>>> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee < >> littlee1...@gmail.com> >> >>>>> wrote: >> >>>>> >> >> >>>>> >> That seems we should add it in the drlvm. >> >>>>> >> >> >>>>> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.re...@gmail.com> >> wrote: >> >>>>> >>> >> >>>>> >>> Nathan Beyer wrote: >> >>>>> >>>> >> >>>>> >>>> Is the IBM VME dealing with this correctly? Do we just need to >> fix >> >>>>> >>>> DRLVM? >> >>>>> >>> >> >>>>> >>> Yes, I only tested on Linux, IBM VME set the property correctly. >> >>>>> >>> >> >>>>> >>>> >> >>>>> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.re...@gmail.com> >> wrote: >> >>>>> >>>>> >> >>>>> >>>>> Kevin Zhou wrote: >> >>>>> >>>>>> >> >>>>> >>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding" >> >>>>> property >> >>>>> >>>>>> adown >> >>>>> >>>>>> VM but fails to get the correct encoding. >> >>>>> >>>>>> >> >>>>> >>>>>> Regis, do you know any other specific ways that CL can gain >> the >> >>>>> right >> >>>>> >>>>>> property? >> >>>>> >>>>> >> >>>>> >>>>> We can get from OS directly. Maybe just read env variables on >> Linux? >> >>>>> >>>>> >> >>>>> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.re...@gmail.com> >> wrote: >> >>>>> >>>>>> >> >>>>> >>>>>>> Charles Lee wrote: >> >>>>> >>>>>>> >> >>>>> >>>>>>>> Hi Nanthan, >> >>>>> >>>>>>>> >> >>>>> >>>>>>>> If the file encoding derive from the OS, it should be the >> some >> >>>>> bugs >> >>>>> >>>>>>>> in >> >>>>> >>>>>>>> it >> >>>>> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our >> default >> >>>>> >>>>>>>> codec >> >>>>> >>>>>>>> is >> >>>>> >>>>>>>> still ISO8859-1. Do you know where can we found such codes? >> >>>>> >>>>>>>> >> >>>>> >>>>>>> Classlib expected vm do this and set the property, but it >> didn't, >> >>>>> so >> >>>>> >>>>>>> we >> >>>>> >>>>>>> have to do this by ourselves. >> >>>>> >>>>>>> >> >>>>> >>>>>>> >> >>>>> >>>>>>> >> >>>>> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer < >> nbe...@gmail.com> >> >>>>> >>>>>>>> wrote: >> >>>>> >>>>>>>> >> >>>>> >>>>>>>> Are we talking about windows or linux?the default file >> encoding >> >>>>> >>>>>>>> should >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> derive from the OS. I believe that's defined by the specs. >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> Sent from my iPhone >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee < >> littlee1...@gmail.com> >> >>>>> >>>>>>>>> wrote: >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>> On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv >> >>>>> >>>>>>>>> <firep...@gmail.com> >> >>>>> >>>>>>>>> >> >>>>> >>>>>>>>>> wrote: >> >>>>> >>>>>>>>>> >> >>>>> >>>>>>>>>> Hi, >> >>>>> >>>>>>>>>> >> >>>>> >>>>>>>>>>> Charles, I believe UTF-8 is the default encoding for RI, >> and >> >>>>> it >> >>>>> >>>>>>>>>>> sounds >> >>>>> >>>>>>>>>>> reasonable. >> >>>>> >>>>>>>>>>> BTW, it may encounter some compatibility problem, maybe >> we >> >>>>> need >> >>>>> >>>>>>>>>>> to >> >>>>> >>>>>>>>>>> run >> >>>>> >>>>>>>>>>> more tests to verify? >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> 2009/7/14 Charles Lee <littlee1...@gmail.com> >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> Hi guys: >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>>> I am doing some test cases on the ant junit test case >> and >> >>>>> >>>>>>>>>>>> meeting >> >>>>> >>>>>>>>>>>> some >> >>>>> >>>>>>>>>>>> encoding problems. I find they are maybe caused by the >> >>>>> different >> >>>>> >>>>>>>>>>>> default >> >>>>> >>>>>>>>>>>> encoding from RI and harmony. My local is en_US.UTF-8, >> RI >> >>>>> >>>>>>>>>>>> default is >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> UTF-8 >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> but harmony is 8859-1. And then I have encountered >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> HARMONY-3736< >> >>>>> https://issues.apache.org/jira/browse/HARMONY-3736>, >> >>>>> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we >> always >> >>>>> get >> >>>>> >>>>>>>>>>>> 8859-1. >> >>>>> >>>>>>>>>>>> Because: (correct me if wrong :-) >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always get >> null >> >>>>> if >> >>>>> >>>>>>>>>>>> we >> >>>>> >>>>>>>>>>>> call >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> vm >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> method >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got >> null >> >>>>> from >> >>>>> >>>>>>>>>>>> vm, >> >>>>> >>>>>>>>>>>> we >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> set >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> Sorry, it should be luniglob.c >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>> 8859-1. >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> 3. we can not set file.encode on the run time. >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the >> non-ascii >> >>>>> >>>>>>>>>>>> character. >> >>>>> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default? >> >>>>> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1, >> it says >> >>>>> >>>>>>>>>>>> "In >> >>>>> >>>>>>>>>>>> computing >> >>>>> >>>>>>>>>>>> applications, encodings that provide full UCS support >> (such as >> >>>>> >>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and >> >>>>> >>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>) are >> finding >> >>>>> >>>>>>>>>>>> increasing >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> favor >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> over encodings based on ISO 8859-1." Should we simply >> change >> >>>>> >>>>>>>>>>> iso8859-1 >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> to >> >>>>> >>>>>>>>>>>> utf-8? >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> -- >> >>>>> >>>>>>>>>>>> Yours sincerely, >> >>>>> >>>>>>>>>>>> Charles Lee >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>>> >> >>>>> >>>>>>>>>>> -- >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> Best Regards! >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> Jimmy, Jing Lv >> >>>>> >>>>>>>>>>> China Software Development Lab, IBM >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>>> >> >>>>> >>>>>>>>>> -- >> >>>>> >>>>>>>>>> Yours sincerely, >> >>>>> >>>>>>>>>> Charles Lee >> >>>>> >>>>>>>>>> >> >>>>> >>>>>>>>>> >> >>>>> >>>>>>> -- >> >>>>> >>>>>>> Best Regards, >> >>>>> >>>>>>> Regis. >> >>>>> >>>>>>> >> >>>>> >>>>> >> >>>>> >>>>> -- >> >>>>> >>>>> Best Regards, >> >>>>> >>>>> Regis. >> >>>>> >>>>> >> >>>>> >>>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> -- >> >>>>> >>> Best Regards, >> >>>>> >>> Regis. >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> -- >> >>>>> >> Yours sincerely, >> >>>>> >> Charles Lee >> >>>>> >> >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > -- >> >>>>> > Yours sincerely, >> >>>>> > Charles Lee >> >>>>> > >> >>>>> > >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Yours sincerely, >> >>>> Charles Lee >> >>>> >> >>> >> >> >> > >> > > > > -- > Yours sincerely, > Charles Lee >