Linux-Development-Sys Digest #706

Digestifier Wed, 12 May 1999 17:22:25 -0700
Linux-Development-Sys Digest #706, Volume #6     Wed, 12 May 99 21:14:12 EDT

Contents:
  Re: glibc-2.0.7 to glibc-2.1.1 (Brent Corbin)
  Re: Glibc rant ("G. Sumner Hayes")
  Re: Translation of linux to minor languages ("Kalaznikov")
  Re: glibc-2.0.7 to glibc-2.1.1 (Andreas Jaeger)
  Re: Translation of linux to minor languages (Guest)
  Re: Linking under SuSe 6.0 (J.H.M. Dassen (Ray))
  Re: pthreads at kernel level? (J.H.M. Dassen (Ray))
  Re: Translation of linux to minor languages (Jonathan A. Buzzard)
  Re: Linux disk defragmenter (bill davidsen)
  Insmod doesn't work ("Marco Aurelio S. Mendes")
  Re: glibc-2.0.7 to glibc-2.1.1 (Paul Kimoto)
  Re: glibc-2.0.7 to glibc-2.1.1 (Paul D. Smith)
  Re: Translation of linux to minor languages (Johan Kullstam)
  Re: glibc-2.0.7 to glibc-2.1.1 (Brent Corbin)
  Re: Translation of linux to minor languages ("Stefan Monnier " 
<[EMAIL PROTECTED]>)

----------------------------------------------------------------------------

From: [EMAIL PROTECTED] (Brent Corbin)
Subject: Re: glibc-2.0.7 to glibc-2.1.1
Date: 12 May 1999 21:12:31 GMT

I think you can compile bash to use libtermcap instead, but don't quote
me on that... 

On 12 May 1999 16:28:08 -0500, Paul Kimoto <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>, G. Sumner Hayes wrote:
>> Read the README -- ncurses will break, as will a few other
>> things.  This will make less break
> [...]
>>                    bash, telnet, ftp, make, etc. all ought
>> to work fine.  
>
>Wait, your /bin/bash isn't dynamically linked to libncurses?!
>
>-- 
>Paul Kimoto            <[EMAIL PROTECTED]>

------------------------------

From: "G. Sumner Hayes" <[EMAIL PROTECTED]>
Crossposted-To: gnu.misc.discuss
Subject: Re: Glibc rant
Date: Wed, 12 May 1999 17:16:47 -0400

Let me try again to see if I can get the general point across rather
than getting bogged down in glibc2.0/2.1 specifics.

It seems like the general consensus on c.o.l.d.s is that a binary
compatibility guarantee similar to the following is desirable:

"A binary built against one stable release of a C library which uses 
only external APIs of that library will continue to work with future 
stable releases of that library which have the same soname".

For background (you can check Dejanews if you want details), I've been
arguing that glibc2.0->2.1 doesn't violate that principle since
glibc2.0 was never an official stable release.  As such, it would be
the fault of the distribution packagers when the move to glibc 2.1
broke the user's old libraries.  (Someone else pointed out to me that
one of the glibc developers actively recommended using glibc 2.0 as the
bases for stable distributions, but we'll leave that issue aside right
now).  

That argument obscures the main point, though.  Suppose that I
make the SLS linux distribution.  SLS version 14 uses glibc stable
release 9.1.  SLS version 15 comes out, based on glibc stable release
9.2.  How can I ensure that users of SLS 14 can upgrade to SLS 15
without having them rebuild any of their personal libraries and
binaries?  

Is this level of binary compatibility a goal of the glibc
developers?  If so, what should the Linux community be doing to help?
If not, then there appears to be a real split between the glibc camp
and the Linux camp; what's the best way to go about mending it?

Thanks for your time,

  Sumner

------------------------------

From: "Kalaznikov" <[EMAIL PROTECTED]>
Subject: Re: Translation of linux to minor languages
Date: Wed, 12 May 1999 17:47:46 +0200
Reply-To: "Kalaznikov" <[EMAIL PROTECTED]>

So, can anyone give examples of well-written, useful and popular programs,
which are actually translatable without you having to redesign the whole
program? Thinking especially about browsers, word-processors and things like
that. Thanks

Yours sincerely Kal





------------------------------

From: Andreas Jaeger <[EMAIL PROTECTED]>
Subject: Re: glibc-2.0.7 to glibc-2.1.1
Date: 12 May 1999 19:48:36 +0200

>>>>> Brent Corbin writes:

 > Well... I confess, some time ago I managed to convert my entire
 > system over to glibc-2.0.7 without paying (enough) attention to
 > all the various warnings to consider it unstable... (hey - it was
 > stable enough for most of the distributions  8*)

 > Now it's time to move towards glibc-2.1.1 ---- let's say I was
 > to install the precompiled rpm's from RedHat - can anyone tell
 > me what would break?

Read the whole FAQ - you need to recompile at least ncurses and
libstdc++. 

 > I guess I'm asking whether glibc-2.1.1 is compatible enough that
 > I maintain the most important aspects of functionality on my 
 > system (bash, telnet, ftp, make, ls...) --- if it is, then there's
 > hope of fixing other compatibility problems piece-by-piece - 
 > if not... <mumble mumble>

glibc 2.1.1 is not yet released, there's only a prerelease available
(pre2).  I would advise to wait for the final version or for the next
prerelease (pre3).

All *known* and reported compatibility problems are AFAIK either
documented (read the FAQ, especially questions 1.17, 2.19, 2.21, 2.27,
2.30, 3.18, 3.20) or fixed.  glibc 2.1 had some compatibility
problems.

The basic functionality should work.

Andreas
-- 
 Andreas Jaeger   [EMAIL PROTECTED]    [EMAIL PROTECTED]
  for pgp-key finger [EMAIL PROTECTED]

------------------------------

Date: Wed, 12 May 1999 18:21:56 +0200
From: [EMAIL PROTECTED] (Guest)
Subject: Re: Translation of linux to minor languages

In article <[EMAIL PROTECTED]> Johan Kullstam <[EMAIL PROTECTED]> 
writes:
> you probably want *both* UTF-8 *and* UTF-16 with a suite of convenient
> translators from one to the other.  UTF-8 has the advantage of being
> fairly compact in memory.  UTF-16 has the advantage that each char is
> the same size (here it is 16 bits).  my feeling is that, as a general
> rule, programs will want to UTF-16 internally and files stored on a
> disk will be in UTF-8.  for example, the getchar and putchar functions
> should be configurable to translate automatically.

Well, this is basically what we did, except that we kept the data as 16-bit
string in memory and only translated to 8-bit (in our case it used the default
system encoding though) when going to file, X Windows or other stuff.

> at this point, i feel it is essential to have some char encoding
> metadata with the files so that you know if a particular file is in
> us-ascii, iso8859-1, UTF-8, UTF-16 or what have you.  the lack of
> metadata is imho a failing of the unix filesystem.  if would also
> allow dispensing with the crock of #!/bin/sh as the first line in a
> file since the metadata would contain that information too.

I've never missed meta-data.  Looking back at my Amiga days, the meta-data
there was simply stored in a separate file with the .info extension.  That was
good enough for me.  I prefer the simplicity (for the user that is) of a
normal Unix file-system over stuff like NTFS's multiple streams and extended
attributes.  And I definately want to be able to specify that some script has
to be started with this or that program with a normal editor of my choice.
To get back to localization and the filesystem, the only thing you need - I
think - is a general convention that all filenames are in UTF-8.  If everyone
in the whole wide world (and every application) follows that rule, you don't
need other stuff.  And for foreign filesystems (MS-DOS, Mac, ...) the
filesystem should translate the names.

> > the file-system.  The only problem with ls would be that it would
> > probably get the column-width wrong when using multi-columns or ls
> > -l.  The reason being that without modifications ls would assume a
> > multibyte UTF-8 encoded string containing 1 character of 3 bytes to
> > be 3 characters wide.
> 
> isn't this why you would use UTF-16 inside a program?

Yes, but it would be a real pain adapting all programs to use UTF-16.  If the
program really needs to know about characters (see below) it indeed should be
converted to either really recognize UTF-8 or (and this is definately more
simple and more efficient) to use 16-bit (or 32 if you really want to be safe
for the future) characters.  However, modifying all utilities to handle that
will mean a lot of work.  The introduction of UTF-8 would allow this move to
be progressive, i.e. tools that don't need to know about actual string-lengths
in characters will already work fine with Japanese and other stuff, whereas
those that do need to know will still run, but will get some things wrong.

> > many bytes should be written to disk.  So, I guess some kind of
> > UTF8strlen() function should be available and ls and other tools
> > that really need to know the width of a string in characters should
> > use that instead.
> 
> all this machinery could probably be used to bring proportional fonts
> into the mix.

Euuhh ... I don't think proportional fonts have anything to do with strlen().
That's GUI stuff and I definately wouldn't like to get that kind of stuff into
the core/kernel of Linux.  You did hit another point though: how would one
display Kanji characters on a VT/console?

> > its own implementation.  Components without direct user-interaction
> > (so no visualization and no user entering data directly into it)
> > will not need to know that UTF-8 is used, since they couldn't care
> > less whether those 3 bytes represent 3 Latin-1 characters or 1
> > foreign character.
> 
> yes they would.  when the shell expands ? - how many chars is that?
> how would sed operate?  anything that processes text as text needs to
> be able to grok the format.

Ok, you're right here.  Again though, the programs won't break because of the
introduction of UTF-8, they would simply assign a strange meaning to stuff
like the '?' in your example.  It would allow a progressive move from a
completely unlocalized environment to one that is fully localized.

My guess is that things like bash, sed, awk, etc. will be amongst the first to
get modified to handle 16-bit characters.  The C library should supply them
with an easy way of translating an UTF-8 char * into a wchar_t * and all other
C library functions that can handle char * should also be available in a
wchar_t *.

Anyhow, I'm not really an active developer on Linux, so I should probably stay
out of these discussions.




------------------------------

From: [EMAIL PROTECTED] (J.H.M. Dassen (Ray))
Subject: Re: Linking under SuSe 6.0
Date: 12 May 1999 18:57:27 GMT

Daniel Sawitzki <[EMAIL PROTECTED]> wrote:
>I found out that the Shared Libs in the /usr/X11R6/lib and /usr/X386/lib
>(and some other dir) were not found. Of course I checkes it out with
>ldconfig -v and ldconfig -p.

ldconfig influences where the dynamic loader looks for shared libraries. It
does not influence where the linker looks for them.

Add -L/usr/X11R6/lib to your link line, and the linker will find them.

HTH,
Ray
-- 
J.H.M. Dassen                 | RUMOUR  Believe all you hear. Your world may  
[EMAIL PROTECTED]      | not be a better one than the one the blocks   
                              | live in but it'll be a sight more vivid.      
                              |     - The Hipcrime Vocab by Chad C. Mulligan  

------------------------------

From: [EMAIL PROTECTED] (J.H.M. Dassen (Ray))
Subject: Re: pthreads at kernel level?
Date: 12 May 1999 19:01:48 GMT

luis malheiro <[EMAIL PROTECTED]> wrote:
>Does Linux have a pthreads implementation at kernel level

That question is a bit deceptive. pthreads (POSIX threads) is a C API;
something at the same level as the C library, i.e. on top of the kernel.

>and able to use SMP (threads in different processors)?

The LinuxThreads add-on to the GNU C library, version 2, implements the
POSIX threads API and uses the clone() system call, which allows for
different threads belonging to the same process to run on different
processors.

HTH,
Ray
-- 
J.H.M. Dassen                 | RUMOUR  Believe all you hear. Your world may  
[EMAIL PROTECTED]      | not be a better one than the one the blocks   
                              | live in but it'll be a sight more vivid.      
                              |     - The Hipcrime Vocab by Chad C. Mulligan  

------------------------------

From: [EMAIL PROTECTED] (Jonathan A. Buzzard)
Subject: Re: Translation of linux to minor languages
Date: Wed, 12 May 1999 21:57:40 +0000

In article <[EMAIL PROTECTED]>,
        Olav W�lfelschneider <[EMAIL PROTECTED]> writes:
> Jonathan A. Buzzard <[EMAIL PROTECTED]> wrote:
> JAB> As your user name does not have to represent your real name this is not
> JAB> important. Why should passwd need to accept Japanese or cyrillic characters?
> 
> And next time, when Sony sells you a VCR or such, why should they care about
> your letters? You will have to read english using japanese characters.
> 
> Ok, bad example, hope you got my point: My language has special characters
> and I damn well want to be able to use them wherever I like. It's MY language
> after all.
> 
> Oh those english people...
> 

The point being that you should not be using words for a password, so what
the characters are makes no difference. I would quite happly type my
password in Egyptian hyroglyphics, and I type this on a German keyboard.


JAB.


-- 
Jonathan A. Buzzard                 Email: [EMAIL PROTECTED]
Northumberland, United Kingdom.       Tel: +44(0)1661-832195

------------------------------

From: [EMAIL PROTECTED] (bill davidsen)
Crossposted-To: comp.os.linux.advocacy
Subject: Re: Linux disk defragmenter
Date: 12 May 1999 19:21:48 GMT

In article <[EMAIL PROTECTED]>,
The Ghost In The Machine <[EMAIL PROTECTED]> wrote:
| On 10 May 1999 03:48:59 -0700, Tim Smith <[EMAIL PROTECTED]> wrote:
| >Mark Hahn  <[EMAIL PROTECTED]> wrote:

| >Seek time is monotonic in linear address, but access time is not.  On modern
| >disks, rotational latency is much larger than seek latency for typical
| >requests.

| It gets even goofier when one realizes that a lot of disks
| are variable-geometry, meaning that there are a different
| number of sectors depending on which cylinder is being
| discussed.  How does one optimize disk access for such
| a beast without detailed knowledge of where each
| sector is?

The answer is that in absolute terms you can't. And more to the point,
I'm not sure if you want to optimize disk access time to maximize
io/sec, or if you want to treat a set of pending requests as a single
problem and optimize for sets/sec. If you always do the fastest io next,
you can leave requests unsatisfied for a very long time.

I believe that bidirectional would be a win for most cases, and a small
loss for the exceptions. However, your comment on rotational speed is
the key one, you have to know all the details include settling time vs
position and length of seek before you could do a good job getting the
optimal solution.

-- 
bill davidsen <[EMAIL PROTECTED]>  CTO, TMR Associates, Inc
  One common problem is mistyping an email address and creating another
valid, though unintended, recipient. Always check the recipient's
address carefully when sending personal information, such as credit
card numbers, death threats or offers of sexual services.

------------------------------

From: "Marco Aurelio S. Mendes" <[EMAIL PROTECTED]>
Subject: Insmod doesn't work
Date: Wed, 12 May 1999 19:14:38 -0300

I'm trying to load some modules dynamically with insmod but it doesn't work.
The modules were compiled using the same compiler I used to compile the
kernel. The kernel version was 2.0.34 and the options to allow loadable
module support were all enabled before the kernel was compiled.

The error message is: couldn't find the kernel version the module was
compiled for.

I tried the same procedure under Linux 2.2 but it doesn't work too.

Any tips?

Thanks in advance
Marco Aurelio S. Mendes




------------------------------

From: [EMAIL PROTECTED] (Paul Kimoto)
Subject: Re: glibc-2.0.7 to glibc-2.1.1
Date: 12 May 1999 16:28:08 -0500
Reply-To: [EMAIL PROTECTED]

In article <[EMAIL PROTECTED]>, G. Sumner Hayes wrote:
> Read the README -- ncurses will break, as will a few other
> things.  This will make less break
 [...]
>                    bash, telnet, ftp, make, etc. all ought
> to work fine.  

Wait, your /bin/bash isn't dynamically linked to libncurses?!

-- 
Paul Kimoto             <[EMAIL PROTECTED]>

------------------------------

From: [EMAIL PROTECTED] (Paul D. Smith)
Subject: Re: glibc-2.0.7 to glibc-2.1.1
Date: 12 May 1999 20:16:26 -0400
Reply-To: [EMAIL PROTECTED]

%% [EMAIL PROTECTED] (Brent Corbin) writes:

  bc> I think you can compile bash to use libtermcap instead, but don't
  bc> quote me on that...

Yes.  Bash uses libreadline for command-line editing (the part that
needs these libs) and libreadline can be compiled to use either one.

-- 
===============================================================================
 Paul D. Smith <[EMAIL PROTECTED]>         Network Management Development
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
===============================================================================
   These are my opinions---Nortel Networks takes no responsibility for them.

------------------------------

Subject: Re: Translation of linux to minor languages
From: Johan Kullstam <[EMAIL PROTECTED]>
Date: 12 May 1999 18:56:00 -0400

[EMAIL PROTECTED] (Guest) writes:

> In article <[EMAIL PROTECTED]> Johan Kullstam 
><[EMAIL PROTECTED]> writes:
> > you probably want *both* UTF-8 *and* UTF-16 with a suite of convenient
> > translators from one to the other.  UTF-8 has the advantage of being
> > fairly compact in memory.  UTF-16 has the advantage that each char is
> > the same size (here it is 16 bits).  my feeling is that, as a general
> > rule, programs will want to UTF-16 internally and files stored on a
> > disk will be in UTF-8.  for example, the getchar and putchar functions
> > should be configurable to translate automatically.
> 
> Well, this is basically what we did, except that we kept the data as 16-bit
> string in memory and only translated to 8-bit (in our case it used the default
> system encoding though) when going to file, X Windows or other stuff.
> 
> > at this point, i feel it is essential to have some char encoding
> > metadata with the files so that you know if a particular file is in
> > us-ascii, iso8859-1, UTF-8, UTF-16 or what have you.  the lack of
> > metadata is imho a failing of the unix filesystem.  if would also
> > allow dispensing with the crock of #!/bin/sh as the first line in a
> > file since the metadata would contain that information too.
> 
> I've never missed meta-data.  Looking back at my Amiga days, the meta-data
> there was simply stored in a separate file with the .info extension.  That was
> good enough for me.  I prefer the simplicity (for the user that is) of a
> normal Unix file-system over stuff like NTFS's multiple streams and extended
> attributes.  And I definately want to be able to specify that some script has
> to be started with this or that program with a normal editor of my
> choice.

yes.  except that the #! mechanism isn't very good.  

1) what if your interpreter doesn't use # as a comment marker?
2) what if two applications want first line specials?  how can you
   have both #!/bin/sh and -*- sh -*- on the same first line?
3) there's more than just this i'd like to affix to the file without
   it being of the file.  documentation strings, prefered editor,
   where you were when you save in that editor &c.

i think having an out-of-band signaling mechanism would be
beneficial.  for example in serial lines, hardware flow control is
much better than ^S/^Q (xon/xoff) signaling.

> To get back to localization and the filesystem, the only thing you need - I
> think - is a general convention that all filenames are in UTF-8.  If everyone
> in the whole wide world (and every application) follows that rule, you don't
> need other stuff.  And for foreign filesystems (MS-DOS, Mac, ...) the
> filesystem should translate the names.

i think filenames stored in a directory would do better in UTF-16.
filenames do not themselves take that much space.  and operations such
as shell globing would be quicker without any translation from UTF-8
directory string to UTF-16 for regexp handling.

the files could be in either UTF-8 or UTF-16 depending on what was
appropriate (speed vs space should be a userland tradeoff).   that is
where some kind of header to mark the contents would be useful.

> > > the file-system.  The only problem with ls would be that it would
> > > probably get the column-width wrong when using multi-columns or ls
> > > -l.  The reason being that without modifications ls would assume a
> > > multibyte UTF-8 encoded string containing 1 character of 3 bytes to
> > > be 3 characters wide.
> > 
> > isn't this why you would use UTF-16 inside a program?

> Yes, but it would be a real pain adapting all programs to use
> UTF-16.  If the program really needs to know about characters (see
> below) it indeed should be converted to either really recognize
> UTF-8 or (and this is definately more simple and more efficient) to
> use 16-bit (or 32 if you really want to be safe for the future)
> characters.  However, modifying all utilities to handle that will
> mean a lot of work.  The introduction of UTF-8 would allow this move
> to be progressive, i.e. tools that don't need to know about actual
> string-lengths in characters will already work fine with Japanese
> and other stuff, whereas those that do need to know will still run,
> but will get some things wrong.

i think you are dead wrong in this assessment.  moving to UTF-16 would
basically entail doing a global search and replace from `char' to
`short' on all your string oriented functions.  internal processing of
UTF-8 would involve all sorts of stunts to keep track of how many
bytes you have for each char.

look at the gnu-emacs implementation of MULE (which uses a UTF-8 like
variable byte length char encoding).  this is an example of how *not*
to do things.

> > > many bytes should be written to disk.  So, I guess some kind of
> > > UTF8strlen() function should be available and ls and other tools
> > > that really need to know the width of a string in characters should
> > > use that instead.
> > 
> > all this machinery could probably be used to bring proportional fonts
> > into the mix.

> Euuhh ... I don't think proportional fonts have anything to do with
> strlen().

it doesn't really.

> That's GUI stuff and I definately wouldn't like to get
> that kind of stuff into the core/kernel of Linux.

and it doesn't belong there either.  however, things like xterms could
benefit from it.

> You did hit
> another point though: how would one display Kanji characters on a
> VT/console?

however, one will need to deal with strings which grow and shrink and
by the time you are done, perhaps proportional fonts would follow.

> > > its own implementation.  Components without direct user-interaction
> > > (so no visualization and no user entering data directly into it)
> > > will not need to know that UTF-8 is used, since they couldn't care
> > > less whether those 3 bytes represent 3 Latin-1 characters or 1
> > > foreign character.
> > 
> > yes they would.  when the shell expands ? - how many chars is that?
> > how would sed operate?  anything that processes text as text needs to
> > be able to grok the format.
> 
> Ok, you're right here.  Again though, the programs won't break because of the
> introduction of UTF-8, they would simply assign a strange meaning to stuff
> like the '?' in your example.  It would allow a progressive move from a
> completely unlocalized environment to one that is fully localized.

> My guess is that things like bash, sed, awk, etc. will be amongst
> the first to get modified to handle 16-bit characters.  The C
> library should supply them with an easy way of translating an UTF-8
> char * into a wchar_t * and all other C library functions that can
> handle char * should also be available in a wchar_t *.

exactly.

make a libUTF.so if nothing else.  have translation functions from 8
to 16 and back again.  there is a one-to-one mapping between UTF-8 and
UTF-16.  it shouldn't be too difficult or slow.

my basic feeling is that UTF-16 is the natural way to work since one
short is one glyph and UTF-8 is more of a simple type of compression
format more suited to mass storage.

-- 
                                           J o h a n  K u l l s t a m
                                           [[EMAIL PROTECTED]]
                                              Don't Fear the Penguin!

------------------------------

From: [EMAIL PROTECTED] (Brent Corbin)
Subject: Re: glibc-2.0.7 to glibc-2.1.1
Date: 13 May 1999 00:34:30 GMT

As I recall, I discovered that while trying to install libtermcap
compiled for glibc using a version of bash compiled for libc5...
It took me a while to figure out how/why the install script bombed 8*(

Fortunately, the wisdom I gained is in large part why I'm approaching
this next upgrade at a less-than-breakneck pace  8*)


On 12 May 1999 20:16:26 -0400, Paul D. Smith <[EMAIL PROTECTED]> wrote:
>%% [EMAIL PROTECTED] (Brent Corbin) writes:
>
>  bc> I think you can compile bash to use libtermcap instead, but don't
>  bc> quote me on that...
>
>Yes.  Bash uses libreadline for command-line editing (the part that
>needs these libs) and libreadline can be compiled to use either one.
>
>-- 
>-------------------------------------------------------------------------------
> Paul D. Smith <[EMAIL PROTECTED]>         Network Management Development
> "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
>-------------------------------------------------------------------------------
>   These are my opinions---Nortel Networks takes no responsibility for them.

------------------------------

From: "Stefan Monnier <[EMAIL PROTECTED]>" 
<[EMAIL PROTECTED]>
Subject: Re: Translation of linux to minor languages
Date: 12 May 1999 20:08:43 -0400

>>>>> "Johan" == Johan Kullstam <[EMAIL PROTECTED]> writes:
> yes.  except that the #! mechanism isn't very good.
> 1) what if your interpreter doesn't use # as a comment marker?

Many interpreters that suffer from this problem have an exception for
the first line for this very purpose.
Of course, on Linux you can also use binfmt_misc to work around that problem
by using some other magic cookie.

> 2) what if two applications want first line specials?  how can you
>    have both #!/bin/sh and -*- sh -*- on the same first line?

You put the -*- sh -*- on the second line (as Emacs does) ?
Or you add a parameter to /bin/sh since the kernel ignores anything
past the first parameter.

Now, don't get me wrong:  #! is far from perfect, but it's a really
cheap hack that covers 95% of what you need and the remaining 5%
can usually/always be recovered without too much trouble.

> 3) there's more than just this i'd like to affix to the file without
>    it being of the file.  documentation strings, prefered editor,
>    where you were when you save in that editor &c.

Be extra careful here:
1 - what do you mean exactly by "documentation string" ?
2 - preferred editor is not really a per-file info but a per-type-of-file
    and per-user info.  This requires a type-info attached to the file
    (available in Unix in a very ad-hoc manner via file(1) using magic
    cookies, or via /etc/mime.types using filename extensions).
3 - "where you were when you saved in that editor" is a per-file and per-user
    info.  The usefulness of such info looks rather dubious to me.
So 2&3 belong at least as much to the user as to the file.
Attaching such meta-data to the file is stupid.  To solve this right,
it seems that you'd need a fancier mechanism that allows to attach info
between several objects.  Like a relational database, for example.

> i think having an out-of-band signaling mechanism would be
> beneficial.  for example in serial lines, hardware flow control is
> much better than ^S/^Q (xon/xoff) signaling.

Indeed, a RDB would be quite out-of-band.

> i think filenames stored in a directory would do better in UTF-16.
> filenames do not themselves take that much space.  and operations such
> as shell globing would be quicker without any translation from UTF-8
> directory string to UTF-16 for regexp handling.

Regexp matching can be done on UTF-8 just fine (no conversion needed).
Better yet:  regexp code need usually zero changes to work on UTF-8.
Regexp engines often use tables indexed by chars.  To deal with 16bit
chars, they often switch to a two-step (twice 8-bit) lookup.

> the files could be in either UTF-8 or UTF-16 depending on what was
> appropriate (speed vs space should be a userland tradeoff).   that is
> where some kind of header to mark the contents would be useful.

This discussion is moot anyway:  Linux's filenames are already UTF-8.

> i think you are dead wrong in this assessment.  moving to UTF-16 would
> basically entail doing a global search and replace from `char' to
> `short' on all your string oriented functions.  internal processing of
> UTF-8 would involve all sorts of stunts to keep track of how many
> bytes you have for each char.

Don't forget that in UTF-16 chars are not necessarily 16bits (most are,
but not all).


        Stefan

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list (and comp.os.linux.development.system) via:

    Internet: [EMAIL PROTECTED]

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi                                pub/Linux
    tsx-11.mit.edu                              pub/linux
    sunsite.unc.edu                             pub/Linux

End of Linux-Development-System Digest
******************************
Linux-Development-Sys Digest #706

Reply via email to