Martin Sebor
Tue, 21 Aug 2007 08:31:45 -0700
Travis Vitek wrote:
Martin Sebor wrote: Yes. But notice the text doesn't say anything about time_put_byname or time_get_byname ;-)Well, the standard doesn't say much at all about the *_byname<> facets. All it really says about them is [21.1.1.2 p4] For some standard facets a standard "..._byname" class,
[...] The _byname requirements are extremely vague. Sometimes they are also implied by the requirements on the base facets, which makes them difficult to find. It's a mess.
So, if I'm reading that right, the *_byname<> facet classes are just there to prevent the user from having to instantiate a std::locale directly.
I'm not sure what you mean by this. The _byname facets are really just an implementation that's exposed in the interface if the locale library. They should have never been specified.
The C++ standard (or even the C standard for that matter) isn't going to of help here.Wait. Say what now? I'm not sure what you're trying to tell me here. If the C++ Standard says that these facets read or write years as roman numerals, then they should probably do so, regardless of what any other standard document requires. I think this will actually get cleared up in a few seconds...
The C and C++ standards only specify the requirements on the "C" locale and leave the localized behavior unspecified. So pretty much anything goes. There are some ground rules but I suspect you won't be able to tease the requirement on swallowing leading space for the %e directive out of them.
Of course that isn't what I'm seeing.Test case?Yeah. See attachment. Only tested on Win32/VC8 and Linux/GCC.
Thanks. Here are the results with stdcxx and with g++ 3.4.6: $ ./t.stdcxx | grep fail string=07/06/08 result=fail locale=thai string= 7.06.1908 result=fail locale=bg_BG string=07/06/08 result=fail locale=lo_LA string=07/06/08 result=fail locale=th_TH $ ./t.gcc | grep fail string=��� %.1d ��� 1908 result=fail locale=ar_SA string=۰۸/۰۶/۰۷ result=fail locale=fa_IR string=ಗುರುವಾರ 07 ಜೂ 1908 result=fail locale=kn_IN Looks like g++ is failing on multibyte character sequences but not on the spaces. We seem to somehow manage to process the multibyte sequences (I wonder how, or if it's a weakness in the test) but have issues with the leading space in bg_BG. I don't know what the problem is with the other locales...
It's hard to say from just looking at the code (and I haven't looked very carefully). In general, we [try to] to implement the POSIX semantics, so if it works with strptime()/strftime() it should work with our time_put_byname/ time_get_byname.Well, there's the problem right there. The standard requires that the time_put<> facet format its output according to the POSIX function strftime(), with the option for supporting extensions. It makes no indication that the time_get<> facet should read data in such a way as to be compatible with strptime(). The only thing I see that says anything about the format expecte by time_get<> is here...
[...]
Right. Pretty vague.
This paragraph says that time_get<>::get_date() is supposed to process the output of time_put<>::put(..., 'x'). [22.2.5.1.2 p4] Effects: Reads characters starting at s until it has extracted those struct tm members, and remaining format characters, used by time_put<>::put to produce the format specified by 'x' or until it encounters an error.
Yes. The problem with the C++ standard in this area is that the requirements a vague and not always implementable (e.g., the multibyte sequences -- all the narrow specializations of the _get facets operate on single characters).
If we test this behavior it's gotta be right ;-) Where does POSIX say leading spaces must be skipped? I see this under %e: Equivalent to %d. And under %d: The day of the month [01,31]; leading zeros are permitted but not required. Nothing about ignoring spaces.Absolutely. The docs for POSIX strftime()...
[...]
So strftime() isn't even compatible with strptime() when it comes to '%e'.
Hmm. That seems like a bug in POSIX then, unless we're missing something. You might want to create a POSIX-only test case to verify this and if I'm right open a discussion on the Austin Group list (http://www.opengroup.org/austin/lists.html).
[...]
Unfortunately, without consistent input/output it is going to be difficult for this multi-threading test to verify that no data corruption is occuring with arbitrary locales. Hopefully there is some system in place that allows us to explicitly specify which locales are to be used for a test.
Not really. My approach would be to detect locales with this problem and avoid using them. The test also doesn't need to be exhaustive, at least not in this iteration. I think exercising just the most common patterns should be good enough (although %X is pretty common :) Martin