Tomasz Wegrzanowski wrote:
> $ cat X
> a a
> a c
> ab c
> $ cat X | LC_COLLATE=C sort

Ordered based upon the underlying ASCII data encoding of the
characters.

> a a
> a c
> ab c
> $ cat X | LC_COLLATE=pl_PL.UTF-8 sort
> a a
> ab c
> a c

Ordered based upon what the pl_PL locale ordering of the characters.

Try the -k option.  I think you will be happier with the result.

  cat X | LC_COLLATE=pl_PL.UTF-8 sort -k1,1
  a a
  a c
  ab c

> This is not lexicographic.

It is as defined by the pl_PL locale.  In that locale setting (and
other non-standard locales) punctuation and whitespace are ignored and
case is folded.  (And no I don't like it either.)

> There's no consistent ordering between b and space.

In the pl_PL locale the space is ignored.

  cat X | LC_COLLATE=pl_PL.UTF-8 sort | tr -d ' '
  aa
  abc
  ac

After you remove the spaces the ordering of pl_PL ignoring spaces is
more apparent.

You are asking for spaces and punctuation to have a declared ordered
in the pl_PL (and others) locale.  But that is out of our hands.
Nothing we can do about it here.  It is now a standards conformance
issue.  The locale data tables that drive this and grep and awk and
and other commands are part of libc.

I suggest that you set your locale to C and avoid all of these locale
dependent problems.

> Characters may be ordered in any locale-dependent way, but the lines
> should be sorted consistently or bad things will happen.

Your statement is in conflict with itself.

Bob


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to