On Fri, 22 Feb 2002, Markus Kuhn wrote:
> Gaspar Sinai wrote on 2002-02-22 00:31 UTC:
> > When looking at these as strings (accoring to Unicode) these
> > strings are the same. In case these strings are used as
> > filenames they will be considered different. Can we resolve
> > this issue without normalization?
>
> Perhaps an extension of the globbing/regexp API is where you want to
> start.

I was thinking about this: maybe the NFS server could enforce
normalization form 'C' so that only the precomposed variant:

U+00F6 ö

could create a file. A huge number of scripts could be supported,
without duplicate filenames. Hangul would immediatelly be ok
without the need of jamo decomposition. And we are also very
lucky that CJK can not be decomposed to radicals :)

I admit this would create some problems...

The unsupported scripts would have two categories:

1. Scripts that could be supported if a precomposed character
 could go into the standard. For instance:

Guarani U+0067 U+0303 g̃ (g with a tilde above)
Guarani U+0047 U+0303 G̃  (G with a tilde above)

2. Scripts that need further thinking:
- Indic: contrary to common belief some scripts have finite
  number of combinations, like Tamil.
- Arabic: if we don't use presentation forms we should be fine.

All the above, of course is just food for thought.

My fear is that if we let NFS server take any legal UTF-8
sequence, in some case we could end up 2 or more files
that look the same. If certain  utilities are not
trained to treat filenames differently than normal text
we might not be able to perform certain operations,
like find operation.

Maybe someone on this list has some idea how to train these
utilities. The only reason why I don't like the client
side to be made resposible is because there are a lot of
utilities out there. Some of them sorts and expunges
duplicates and they all will use unicode to do that.

Some commands may not even know that the strings are
filenames and equivalent strings may not be equivalent:
two file-names could appear as one after the uniq(1)
command, for instance.


Thanks
gaspar


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to