Travis Vitek wrote:

Okay, I think I've finally got something that will be useful to someone. I'm
attaching the patch to STDCXX-608
[https://issues.apache.org/jira/browse/STDCXX-608] for review.

There is a lot of code, and the feature is not 100% complete yet. I need to
create a test, deprecate the old rw_locales() function, and come up with a
way to locate the input files without actually requiring that the
environment variable TOPDIR be defined.

The system is fairly simple from the public interface. A new public type
rw_locale_entry_t

What's the advantage of returning a list instead of just a character
string like rw_locales() does? (Wouldn't it be simpler to just stick
with the same interface?)

I assume the rw_locale_entry_t members language, country, and encodings
are populated with our canonical names for each, correct?

Is the rw_locale_entry_t::name member populated with a string specific
to each operating system? If so, which of the two forms does it use:
the one returned by locale -a or the one returned by setlocale()?

FYI, the setlocale() names can be really long on some platforms (e.g.,
on HP-UX, they always take the form:
/<category>/<category>/<category>/<category>/<category>/<category>
so 64 characters may not be enough for all locales).

has been added. It represents is a link in a linked list
of installed locales. The new function rw_all_locales() gives you a pointer
to the first item in a sorted list of installed locales. Another new
function rw_locale_query() takes a query string and a count, and it returns
a pointer to the first entry in a linked list of locale entries that match
the provided query string. The count parameter is used to limit the number
of locales in the linked list.

Is the user responsible for freeing the linked list? If so, how?


The query string allows you to specify what attributes you want to query,
what values you want those attributes to have, and what priority to give
thos attributes relative to others. The supported attributes are language
[L], country [C], encoding [E], and mb_cur_len [M]. Multiple values for the
same attribute can be specified by seperating them with a | character. You
can use a * as a match anything wildcard expression. I just realized that it
might be useful to omit certain attribute values. If someone thinks this
might be useful, we could do that with a ! or ^.

I suppose it could potentially be a useful extension, although off
hand I can't come up with a use case.


As an example, imagine that I want to find up to 10 locales for Japan or
China that have MB_CUR_LEN of 4 or 3. You could get that list of locales
with the following query...

  const rw_locale_entry_t* e = rw_locale_query ("C=JP|CN M=4|3", 10);

I assume the AND operator is implicit between the subexpressions?
I.e., the query is equivalent to

  (C == "JP" OR C == "CN") && (MB_CUR_LEN == 3 || MB_CUR_LEN == 4)

I'd like to make a simplifying suggestion regarding the query syntax
[ducks] ;-) First, I'd like to suggest to drop the attributes C, E,
and L, and instead assume the standard canonical locale name in the
form <language>_<country>.<encoding>. Second, since we already have
simple pattern matching in the form of rw_fnmatch() and since at at
some point we'll need to add shell brace expansion (e.g., for the
expected failures project), I'd like to propose that rather than
using our own special syntax here and pattern matching and brace
expansion elsewhere, we start with both here as well.

With that, the first part of the query string above would look like
this: "*_{JP,CN}.*"

The shell brace expansion syntax looks something like this:

  string     ::= <brace-expr> | [ <chars> ]
  brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
  brace-list ::= <string> ',' <brace-list> | <string>
  chars      ::= <pcs-char> <string> | <pcs-char>
  pcs-char   ::= character in the Portable Character Set

For the rest of the query I wonder if we could come up with a more
conventional (and possibly more expressive) syntax that could use
in other areas as well. I'm thinking something loosely based on
grep might work, with multiple lines representing a disjunction of
the expressions on each line, and with subexpressions on the same
line being representing a conjunction of the subexpressions.

So the query string from your example above would look like this:

  *_{JP,CN}.* {3,4}

Internally it would translate into multiple grep-like expressions
(i.e., arguments to the -e grep option) looking like this:

  *_JP.* 3\n
  *_JP.* 4\n
  *_CN.* 3\n
  *_CN.* 4\n

with the whole thing basically being a simplified grep pattern that
could be used to search in a plain text file in this format:

  <locale> <mb-cur-max> <alias-list>

If we also wanted to include, say, an English locale in UTF, we
would write:

  *_{JP,CN}.*  *{3,4}\n
  en_*.UTF-8

I realize this is a little different from what I outlined earlier
but after protyping the "expected failures" solution I think the
bracket expression will be a very handy tool to add to the driver,
and since we already have pattern matching in rw_fnmatch() we might
as well put it to good use.

Martin

Reply via email to