Re: Files, Directories, Resources, Operating Systems

2008-12-10 Thread Aristotle Pagaltzis
* Charles Bailey [EMAIL PROTECTED] [2008-12-10 03:15]:
 It may well be that a fine-grained interface isn't practical,
 but perhaps there are some basics that we could implement, such
 as

 - set owner of this thing
 - (maybe) set group of this thing
 - give owner|everyone|?some-group the ability to read
   from|write to|remove|run this thing
 - tell me whether any of these is possible
 - make the metadata for this thing the same as the metadata for
   that thing
 - tell me when this thing was created|last updated

There are many problematic suggestions here. Some examples:

• Unix does not track file creation datetime at all.

• The concept of making a file runnable doesn’t even exist on
  Windows: that property is derived from the filename extension.

• Delete permission on a file is a concept that doesn’t exist on
  Unix. To be able to delete a file, you instead need write
  permission on the directory it resides in.

Furthermore, in Win32, files and directories can inherit
permissions, so the fact that a file has certain effective
permissions does not mean that these permissions are set on
the file itself. But if you set them on the file itself, you
dissociate it from the inheritance chain. So reading permissions
and then setting them the same, without changing anything, can
still have unwanted side effects. Or if you try to make the API
smart, and so make it set permissions only when they constitute
a change from the effective permissions, then conversely the user
no longer has a way to dissociate the file from iheritance if
that *is* what they wanted. So the concept of inheritance must
be exposed explicitly.

This is the primary issue I was thinking of when I said that some
differences between Win32 and Unix have such pervasive effects
that it seems impossible to provide even a rudimentary abstract
interface.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-10 Thread Timothy S. Nelson
	I''ve been playing with similar sorts of problems when creating an OO 
model for packaging metadata, that could supposedly represent the data from a 
.rpm or a .deb or whatever.


	The first thing I did was set up a method where if we're outputting 
eg. an RPM, it will mark every piece of metadata it uses, and then afterwards, 
the core system will emit warnings about all the things it didn't use. 
Something similar could possibly be done; we'd simply need to give the user 
control as to where the warnings end up.


	Note that I also agree with the guy who said that we need 
system-specific calls, and then an abstraction layer on top of that.


On Wed, 10 Dec 2008, Aristotle Pagaltzis wrote:


* Charles Bailey [EMAIL PROTECTED] [2008-12-10 03:15]:

It may well be that a fine-grained interface isn't practical,
but perhaps there are some basics that we could implement, such
as

- set owner of this thing
- (maybe) set group of this thing
- give owner|everyone|?some-group the ability to read
  from|write to|remove|run this thing
- tell me whether any of these is possible
- make the metadata for this thing the same as the metadata for
  that thing
- tell me when this thing was created|last updated


There are many problematic suggestions here. Some examples:

? Unix does not track file creation datetime at all.


Emit a warning.


? The concept of making a file runnable doesn?t even exist on
 Windows: that property is derived from the filename extension.


	So when they read it, make a guess based on the extension, and when 
they write it, emit an error.



? Delete permission on a file is a concept that doesn?t exist on
 Unix. To be able to delete a file, you instead need write
 permission on the directory it resides in.


	So when they read it, figure it out, and when they write it, emit an 
error.



Furthermore, in Win32, files and directories can inherit
permissions, so the fact that a file has certain effective
permissions does not mean that these permissions are set on
the file itself. But if you set them on the file itself, you
dissociate it from the inheritance chain. So reading permissions
and then setting them the same, without changing anything, can
still have unwanted side effects. Or if you try to make the API
smart, and so make it set permissions only when they constitute
a change from the effective permissions, then conversely the user
no longer has a way to dissociate the file from iheritance if
that *is* what they wanted. So the concept of inheritance must
be exposed explicitly.


	Or, you could pick a consistent model, and then let the user use the 
lower-level interface if they want to be more specific.



This is the primary issue I was thinking of when I said that some
differences between Win32 and Unix have such pervasive effects
that it seems impossible to provide even a rudimentary abstract
interface.


	Try rudimentary *optional* abstract interface, where the other 
option is system-specific.


:)


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: [EMAIL PROTECTED]| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y-

-END GEEK CODE BLOCK-



Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Aristotle Pagaltzis
* Mark Overmeer [EMAIL PROTECTED] [2008-12-08 21:20]:
 A pitty that we do not focus on the general concept of OS
 abstraction (knowing that some problems are only partially
 solvable (on the moment)).

Well go on. Explain how you would, f.ex., provide an abstract
API over file ownership and access permissions between Win32
and Unix? I don’t see such a thing being possible at all: there
are too many differences with pervasive consequences. The most
you can reasonably do (AFAICT) is map Win32-style owner/access
info to a Unix-style API for reading only.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Aristotle Pagaltzis
* Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]:
 Well go on.

Btw, I just realised that it can be read as sarcastic, which I
didn’t intend. I am honestly curious, even if skeptical. I am
biased, but I am open to be convinced.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Brandon S. Allbery KF8NH

On 2008 Dec 9, at 19:56, Aristotle Pagaltzis wrote:

* Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]:

Well go on.


Btw, I just realised that it can be read as sarcastic, which I
didn’t intend. I am honestly curious, even if skeptical. I am
biased, but I am open to be convinced.



BTW you can run into this issue even only considering Unix/POSIX:   
POSIX ACLs, AFS, NFSv4.


I can see the point of a very simple base API with system-dependent  
extensions, but am likewise skeptical that one can be designed that  
isn't useless.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH




Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Charles Bailey
It may well be that a fine-grained interface isn't practical, but
perhaps there are some basics that we could implement, such as

- set owner of this thing
- (maybe) set group of this thing
- give owner|everyone|?some-group the ability to read from|write
to|remove|run this thing
- tell me whether any of these is possible
- make the metadata for this thing the same as the metadata for that thing
- tell me when this thing was created|last updated

in addition to the usual CRUD operations.  More detailed views of
metadata might be the providence of OS-specific modules, as might
different semantics for content (and even stringy metadata).  But
having this sort of simplified works-everywhere layer interposed
should handle common tasks like reading, writing, and copying without
making everyone replicate OS-specific variants.

The basic operations above have a POSIXy flavor, but the underlying
details shouldn't.  For instance, allow me to read and write this
thing != chmod 6xx, thing.  I'm not saying this is an easy solution,
just that it's worth the effort.

Then again, I think File::Copy is a better choice than Csystem cp
for publicly distributed code, so I'm already biased.

--
Regards,
Charles Bailey



On 12/9/08, Brandon S. Allbery KF8NH [EMAIL PROTECTED] wrote:
 On 2008 Dec 9, at 19:56, Aristotle Pagaltzis wrote:
 * Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]:
 Well go on.

 Btw, I just realised that it can be read as sarcastic, which I
 didn't intend. I am honestly curious, even if skeptical. I am
 biased, but I am open to be convinced.


 BTW you can run into this issue even only considering Unix/POSIX:
 POSIX ACLs, AFS, NFSv4.

 I can see the point of a very simple base API with system-dependent
 extensions, but am likewise skeptical that one can be designed that
 isn't useless.

 --
 brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
 system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
 electrical and computer engineering, carnegie mellon universityKF8NH





-- 
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu


Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Brandon S. Allbery KF8NH

On 2008 Dec 9, at 21:11, Charles Bailey wrote:

It may well be that a fine-grained interface isn't practical, but
perhaps there are some basics that we could implement, such as

- set owner of this thing
- (maybe) set group of this thing


Group is problematic; I don't recall Windows having group ownership  
(as distinct from group ACLs), and AFS PTS groups are very different  
from Unix groups.


As I said, I'm all in favor of such an API, just skeptical that a  
useful one can be devised.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH




Re: Files, Directories, Resources, Operating Systems

2008-12-09 Thread Mark Overmeer
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081210 00:06]:
 * Mark Overmeer [EMAIL PROTECTED] [2008-12-08 21:20]:
  A pitty that we do not focus on the general concept of OS
  abstraction (knowing that some problems are only partially
  solvable (on the moment)).
 
 Well go on. Explain how you would, f.ex., provide an abstract
 API over file ownership and access permissions between Win32
 and Unix? I don’t see such a thing being possible at all: there
 are too many differences with pervasive consequences. The most
 you can reasonably do (AFAICT) is map Win32-style owner/access
 info to a Unix-style API for reading only.

(I do not have time today for long emails... paying work to do :-(
The short answer:

Just like Path::Class or IO::File, I suggest an OO interface.  That
means that you may share methods between different OSes but it also
may not be possible.

Within this OO interface, you could design two abstraction levels:
one which maps directly to the OS calls, like supports chown via some
POSIX mix-in.  On an other level, we attempt to unify environments. For
the latter, you can think of methods like owner getter and setter,
os_family or size.

Even more to my likings is an additional super-level.  In this case,
the actual platform-dependent implementation does its best... Maybe
something like:  (still Perl5 style)

   $file-change_attributes(owner = $user, group = $group,
  readable = 1, ...);

The core implementation tries as good and as bad as it goes to unify
various kinds of attributes onto OS specific features, taking care
of nastiness like change-order limitations.  Typically becoming smarter
over time.  Real DWIMming, exploiting our joint knowledge and share this.
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-12-08 Thread Aristotle Pagaltzis
* Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]:
 So why are you all so hessitating in making each other's life
 easier? There is no 100% solution, but 0% is even worse!

It looks like Python 3000 just tried that.

People are not happy about it:
http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-08 Thread Leon Timmermans
On Mon, Dec 8, 2008 at 8:16 PM, Aristotle Pagaltzis [EMAIL PROTECTED] wrote:
 It looks like Python 3000 just tried that.

 People are not happy about it:
 http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem


Yeeh, I also noted exactly that problem when reading the What's New
In Python 3.0. What were they thinking?!

Leon


Re: Files, Directories, Resources, Operating Systems

2008-12-08 Thread Mark Overmeer
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081208 19:16]:
 * Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]:
  So why are you all so hessitating in making each other's life
  easier? There is no 100% solution, but 0% is even worse!
 
 It looks like Python 3000 just tried that.
 People are not happy about it:
 http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem

I thought we were having a serious discussion.  We all know that
considering all names as Unicode is a stupid presumption.

It seems that some bright minds got stuck in a deep recursion about
codesets in file- and directory names.  A pitty that we do not
focus on the general concept of OS abstraction (knowing that some
problems are only partially solvable (on the moment)).
-- 
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-12-07 Thread Mark Overmeer
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 16:57]:
 * Mark Overmeer [EMAIL PROTECTED] [2008-12-04 16:50]:
  * Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]:
   Furthermore, from the point of view of the OS, even treating file
   names as opaque binary blobs is actually fine! Programs don’t
   care after all. In fact, no problem shows up until the point
   where you try to show filenames to a user; that is when the
   headaches start, not any sooner.
 
  So, they start when
- you have users pick filenames (with Tk) for a graphical
  applications. You have to know the right codeset to be able
  to display them correctly.
 
 Yes, but you can afford imperfection because presumably you know
 which displayed filename corresponds to which stored octet
 sequence, so even if the name displays incorrectly, you still
 operate on the right file if the user picks it.

With all these different encodings, it is easy to show filenames
which are not a little-bit incorrect, but which are unrecognizably
corrupted.

In the whole debate, it look like there are only two groups of developers
involved: the programming language authors and the end-application
developers.
   But do not forget that there are also CPAN library authors and
maintainers (my main involvement) When you create a good library, you
have to support multiple (unpredicatable) platformas and languages.
Each time you say: oh, just let the end-user figure that out, you add
complexity and distribute implementation horrors.  Good, generally
available libraries are crucial for any language.

- you have XML-files with meta-data on files which are
  being distributed. (I have a lot of those)
 Use URI encoding unless you like a world of pain.

You are looking at it from the wrong point of view: Perl is used as
a glue language: other people determine what kind of data we have
to process.  So, also in my case, the content of these XML structures
is totally out of my hands: no influence on the definitions at all.
I think that is the more common situation.

 NTFS seems to say it’s all Unicode and comes back as either
 CP1252 or UTF-16 depending on which API you use, so I guess you
 could auto-decode those. But FAT is codepage-dependent, and I
 don’t know if Windows has a good way of distinguishing when you
 are getting what. So Windows seems marginally more consistent
 than Unix, but possibly only apparently. (What happens if you zip
 a file with random binary garbage for a name on Unix and then
 unzip it on Windows?)
 
 I have no idea what other systems do.

Well, the nice thing about File::Spec/Class::Path is that someone
did know how those systems work and everyone can benefit from it.
So why are you all so hessitating in making each other's life easier?
There is no 100% solution, but 0% is even worse!

Once upon a time, Perl people where eager for good DWIMming and powerful
programming.  Nowadays, I see so much fear in our community to attempt
simpler/better/other ways of programming.  We get a brand new language,
with a horribly outdated documentation system and very traditional OS
approach.  As if everyone prefers to stick to Perl's 22 years and Unixes
39 years old choices, where the world around us saw huge development
and change in needs.  Are we just getting old, grumpy and tired?
Where is the new blood to stir us up?
- 
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-12-07 Thread Aristotle Pagaltzis
* Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]:
 - you have XML-files with meta-data on files which are
   being distributed. (I have a lot of those)
  Use URI encoding unless you like a world of pain.

 You are looking at it from the wrong point of view: Perl is
 used as a glue language: other people determine what kind of
 data we have to process. So, also in my case, the content of
 these XML structures is totally out of my hands: no influence
 on the definitions at all. I think that is the more common
 situation.

If you start with a broken data format, no amount of papering
over it will unbreak it. Sorry, Perl 6 won’t have magic ponies to
fix that. Ambiguous data cannot be disambiguated by smart code.

If you want to try anyway, talk to someone who didn’t get their
name on an IETF RFC out of disgust with the state of an unfixably
messy legacy data format.

  NTFS seems to say it’s all Unicode and comes back as either
  CP1252 or UTF-16 depending on which API you use, so I guess
  you could auto-decode those. But FAT is codepage-dependent,
  and I don’t know if Windows has a good way of distinguishing
  when you are getting what. So Windows seems marginally more
  consistent than Unix, but possibly only apparently. (What
  happens if you zip a file with random binary garbage for a
  name on Unix and then unzip it on Windows?)
 
  I have no idea what other systems do.

 Well, the nice thing about File::Spec/Class::Path is that
 someone did know how those systems work and everyone can
 benefit from it.

These modules are completely and utterly oblivious to encoding
issues, so I have no idea how they are relevant in the first
place.

 So why are you all so hessitating in making each other's life
 easier? There is no 100% solution, but 0% is even worse!

Because I have seen Java, and it taught me that the 90% solution
is worse than the 20% solution. Provide 20% in the language and
someone will use that and write Path::Class. And if we abstain
from putting today’s best solutions in the core library, then we
have a chance that tomorrow’s best solutions might gain traction.
(Otherwise we get 10 years of CGI.pm again.)

 Once upon a time, Perl people where eager for good DWIMming and
 powerful programming.

And yet it’s the CPAN that turned out to be Perl’s greatest
strength. If you suggested the initial concept of the CPAN today,
people would laugh at you – it would seem like an April fool’s
joke. It didn’t even have a standard package format!

 Nowadays, I see so much fear in our community to attempt
 simpler/better/other ways of programming.

Simpler in what way? All abstractions leak. Take this into
account or make users suffer.

 We get a brand new language, with a horribly outdated
 documentation system and very traditional OS approach. As if
 everyone prefers to stick to Perl's 22 years and Unixes 39
 years old choices, where the world around us saw huge
 development and change in needs.

If you can show me a ubiquitous kernel that runs perl and was
designed less than 15 years ago, I’ll show you a modern OS API
approach.

If you want to see an attempt at an abstract interface layered
over crusty OS designs, I’ll show you Java.

Abstaining from the attractive nuisance of abstracting small-
seeming differences away seems to have worked out well enough for
DBI, anyway. Would you argue that DBI is not a good or relevant
example? (And if so, why?) Or are you suggesting that approach
was a failure or horrible in some way?

 Are we just getting old, grumpy and tired? Where is the new
 blood to stir us up?

Busy designing their own second system. You want to invite a
bunch of PHP kids? I’m game. :-)

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-04 Thread Aristotle Pagaltzis
* Tom Christiansen [EMAIL PROTECTED] [2008-11-27 11:30]:
 In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED]
of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED]
  I believe that the most important issues here, those having
  to do with identity, can be discussed and solved without
  unduly worrying about matters of collation;

 It's funny you should say that, as I could nearly swear that I
 just showed that identify cannot be determmined in the examples
 above without knowing about locales. To wit, while all of
 those sort somewhat differently, even case-insensitively, no
 matter whether you're thinking of a French or a Spanish
 ordering (and what is English's, anyway?), you have a a more
 fundadmental = vs != scenario which is entirely
 locale-dependent.

 If I can make a RESUME file, ought I be able to make a
 distcint r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a
 case-ignorant filesystem?

That’s for the file system to know, not Perl 6. Trying to unify
this in any way on the side of Perl is, in my regard, a fool’s
errand. If the file system is case insensitive, then it will make
the call in whatever way it deems correct, and it’s not for us to
worry about all the possible ways in which all possible current
and future file systems might answer such questions.

Furthermore, from the point of view of the OS, even treating file
names as opaque binary blobs is actually fine! Programs don’t
care after all. In fact, no problem shows up until the point
where you try to show filenames to a user; that is when the
headaches start, not any sooner.

To that, the right solution is simply not to roundtrip filenames
through the user interface; instead, keep both the original octet
sequence as well as the decoded version, and use the decoded
version in UI but refer back to the pristine original when the
user elects, via UI, to operate on that file.

As far as I am concerned, if Perl 6 has a distinction between
octet strings and character strings, then all that’s required is
to have filenames returned from OS APIs come back as octet
strings, keeping the programmer from forgetting to deal with
decoding issues. The higher-level problems like sorting names in
a locale-aware fashion will be solved by the CPAN collective much
better than any boil-the-ocean abstract interface design that the
Perl 6 cabal would produce – if indeed these are real problems at
all in practice.

All that’s necessary is to design the interface such that it
won’t obstruct subsequent “userland” solution approaches.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Files, Directories, Resources, Operating Systems

2008-12-04 Thread Mark Overmeer
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]:
 Furthermore, from the point of view of the OS, even treating file
 names as opaque binary blobs is actually fine! Programs don’t
 care after all. In fact, no problem shows up until the point
 where you try to show filenames to a user; that is when the
 headaches start, not any sooner.

So, they start when
  - you have users pick filenames (with Tk) for a graphical
applications. You have to know the right codeset to be able
to display them correctly.
  - you have XML-files with meta-data on files which are
being distributed.  (I have a lot of those)
  - when you start doing path manipulation on (UTF-16) blobs,
and so forth.  I have been fighting these problems for a long
time, and they worry me more and more because we see Unicode being
introduced on the OS-level.  The mess is growing by the day.

 To that, the right solution is simply nt to roundtrip filenames
 through the user interface; instead, keep both the original octet
 sequence as well as the decoded version, and use the decoded
 version in UI but refer back to the pristine original when the
 user elects, via UI, to operate on that file.

But now you simply say decode it.  But to be able to decode
it, you must known in which charset it is in the first place.
So: where do we start guessing?  An educated guess at OS level,
or on each user program again?

 decoding issues. The higher-level problems like sorting names in
 a locale-aware fashion will be solved by the CPAN collective much
 better than any boil-the-ocean abstract interface design that the
 Perl 6 cabal would produce – if indeed these are real problems at
 all in practice.

Why?  Are CPAN programmers smarter than Perl6 Cabal people?

What I whould like to be designed is an object model for OS, processes
directories, and files.  We will not be able to solve all problems for
each OS.  Maybe people need to install additional CPAN modules to get
smarter behavior.  But I would really welcome it if platform independent
coding is the default behavior, without need for File::Spec, Class::Path
and such.  Once, we have made the step from FILEHANDLES to IO::File.
Let's make it go a little further.

The discussion is stuck in filenames, which are a problematic area.
But we started with chown and friends.   It really would like to
be able to write:

   $file = File.new($filename);
   $file.owner($user);
   if $file.owner eq $user {}
   $file.open()

over
   if($has_POSIX)
   {   chown $filename, $user;
   if((stat $filename)[4]==getpwuid $user) {}
   }
   else
   {   die Sorry, do not understand your system;
   }

-- 
Regards,

   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-12-04 Thread Aristotle Pagaltzis
* Mark Overmeer [EMAIL PROTECTED] [2008-12-04 16:50]:
 * Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]:
  Furthermore, from the point of view of the OS, even treating file
  names as opaque binary blobs is actually fine! Programs don’t
  care after all. In fact, no problem shows up until the point
  where you try to show filenames to a user; that is when the
  headaches start, not any sooner.

 So, they start when
   - you have users pick filenames (with Tk) for a graphical
 applications. You have to know the right codeset to be able
 to display them correctly.

Yes, but you can afford imperfection because presumably you know
which displayed filename corresponds to which stored octet
sequence, so even if the name displays incorrectly, you still
operate on the right file if the user picks it.

   - you have XML-files with meta-data on files which are
 being distributed. (I have a lot of those)

Use URI encoding unless you like a world of pain.

   - when you start doing path manipulation on (UTF-16) blobs,
 and so forth. I have been fighting these problems for a long
 time, and they worry me more and more because we see Unicode being
 introduced on the OS-level. The mess is growing by the day.

And all we can do is to avoid making it even bigger. Because the
only ones in control here are the OS vendors, and they aren’t
solving it, only making it bigger. The only thing *we* can do is
not to erect obstacles that users will have to work around when
our abstractions invariably leak.

I am unconvinced that this problem actually yields to
abstraction. All the really hard problems in computing are the
ones that intersect with human culture – text in any form, and
dates and times. When computers deal with mathematical entities,
few problems are even hard, let alone insurmountable, you only
need to work at them long enough. Human concepts are not like
that, they are messy and inconsistent.

  To that, the right solution is simply nt to roundtrip filenames
  through the user interface; instead, keep both the original octet
  sequence as well as the decoded version, and use the decoded
  version in UI but refer back to the pristine original when the
  user elects, via UI, to operate on that file.

 But now you simply say decode it. But to be able to decode
 it, you must known in which charset it is in the first place.
 So: where do we start guessing? An educated guess at OS level,
 or on each user program again?

I am not advocating educated guesses. The mechanism would be
whatever interfaces the system provides. Unix does not have any,
so you can indeed only ever guess, but if they system can give
you something better, that should be used.

NTFS seems to say it’s all Unicode and comes back as either
CP1252 or UTF-16 depending on which API you use, so I guess you
could auto-decode those. But FAT is codepage-dependent, and I
don’t know if Windows has a good way of distinguishing when you
are getting what. So Windows seems marginally more consistent
than Unix, but possibly only apparently. (What happens if you zip
a file with random binary garbage for a name on Unix and then
unzip it on Windows?)

I have no idea what other systems do.

But there is no common denominator, so pretending there is one is
not going to help.

  The higher-level problems like sorting names in a
  locale-aware fashion will be solved by the CPAN collective
  much better than any boil-the-ocean abstract interface design
  that the Perl 6 cabal would produce – if indeed these are
  real problems at all in practice.

 Why? Are CPAN programmers smarter than Perl6 Cabal people?

Of course! There are many more CPAN programmers than cabalists;
some of them are bound to have much greater expertise in some
relevant area of this problem than anyone in the cabal. Even
those who aren’t that smart will have direct access to and
specific knowledge of the system they are dealing with, that
the cabal may never even hear about.

 What I whould like to be designed is an object model for OS,
 processes directories, and files. We will not be able to solve
 all problems for each OS. Maybe people need to install
 additional CPAN modules to get smarter behavior. But I would
 really welcome it if platform independent coding is the default
 behavior, without need for File::Spec, Class::Path and such.

Ugh. I understand the desire, but it is very easy to get into
architecture astronautics. I think we should follow the DBI
approach and not try to provide a unified interface to system-
specific things like permissions and ownership: unify the most
general notions of filesystems but leave all the specifics to be
dealt with by user code in the concrete. That is the only place
where the amount of acceptable abstraction can be decided. Cf.
writing apps that run on all of PostgreSQL, MySQL and Oracle vs
those that take advantage of specific DBMS features: this is a
decision that the programmer has to make, it is not one we can
make on his behalf.

Regards,

Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Tom Christiansen
In-Reply-To: Message from Mark Overmeer [EMAIL PROTECTED] 
   of Thu, 27 Nov 2008 08:23:50 +0100. [EMAIL PROTECTED] 

* Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]:

 On Wed, 26 Nov 2008 11:18:01 PST.--or, for backwards compatibility,
 at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC,
 Larry Wall [EMAIL PROTECTED] wrote:

 SUMMARY: I've been looking into this sort of thing lately (see p5p),
  and there may not even *be* **a** right answer.  The reasons
  why take us into an area we've traditionally avoided.

 What a long message...

It *was*?  That was approaching a medium in my epistolary (and RFC) world,
the one unrelated to PostIt notes.  I can therefore see you've never been
FMTEYEWTK'd, and thus also to all outward appearances, we've not made each
other's acquaintance.  I'm tchrist; pleased to meet you.

Read the //www.unicode.org/reports/tr10/ treatise, as I have repeatedly 
done, and you will quickly reassess your length calls.  This is not
necessarily a good thing.  Neal Stephenson can do the same, and of
far lesser utility.

--tom


Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Richard Hainsworth
Just as a variable name in perl6 must conform to a standard and abide by 
a set of constraints, why should file or other resource names be an 
exception?


The constraints on variable names in perl6 are very flexible, but there 
are some rules that must be enforced for a program to work.


It seems to me that resource (eg. file) names too should also be 
constrained so that software portability can be ensured. A reasonably 
constructed set of constraints for the perl6 core should deal with most 
locale/OS/character set considerations, and where a particular 
environment cannot cope, then a module will be needed to eigenmunge 
the names appropriately.


Suppose for the sake of argument we state that resource names in perl6 
shall comply with the rules for variable names; and the sort sequence of 
such names is the one defined for unicode strings.


Where software in perl6 is written for a specific domain, eg. Catalan or 
Russian, the programmer will know more about the domain and how to deal 
with resource names in that locale. This would include sort sequences 
and the complexities Tom outlined. Such things would be relegated to OS 
/ domain specific modules.


Would this help?

Tom Christiansen wrote:
In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] 
   of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] 

  

Tom Christiansen wrote:



  

 I believe database folks have been doing the same with character data, but
 I'm not up-to-date on the DB world, so maybe we have some metainfo about
 the locale to draw on there.  Tim?
  


  

AFAIK, modern databases are all strongly typed at least to the point
that the values you store in and fetch from them are each explicitly
character data or binary data or numbers or what-have-you; and so,
when you are dealing with a DBMS in terms of character data, it is
explicitly specified somewhere (either locally for the data or
globally/hardcoded for the DBMS) that each value of character data
belongs to a particular character repertoire and text encoding, and so
the DBMS knows what encoding etc the character data is in, or at least
it treats it consistently based on what the user said it was when it
input the data.



Oh, good then.  That's what I'd heard was happening, but wasn't sure since
I've steared clear of such beasties since before it was true.

I wish our filesystems worked that way.  But Andrew said something to me
last week about Ken and Dennis writing quite pointedly that while you
*could* use the f/s as a database, that you *shouldn't*.  I didn't know
the reference he was thinking of, so just nodded pensively (=thoughtfully).

  

 There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
 string should test equal, and when, nor how to order them, without
 knowing the locale:
 
 RESUME,

 Resume
 resume
 Resum\x{e9}
 r\x{E9}sum\x{E9}
 r\x{E9}sume\x{301}
 Re\x{301}sume\x{301}
  


  

 Case insensitively, in Spanish they should be identical in all regards.
 In French, they should be identical but for ties, in which case you
 work your way right to left on the diactricals.
  


  

This leads me to talk about my main point about sensitivity etc.



  

I believe that the most important issues here, those having to do with
identity, can be discussed and solved without unduly worrying about
matters of collation;



It's funny you should say that, as I could nearly swear that I just showed
that identify cannot be determmined in the examples above without knowing
about locales.  To wit, while all of those sort somewhat differently, even
case-insensitively, no matter whether you're thinking of a French or a
Spanish ordering (and what is English's, anyway?), you have a a more
fundadmental = vs != scenario which is entirely locale-dependent.

If I can make a RESUME file, ought I be able to make a distcint
r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant
filesystem? There is no good answer, because we might think it
reasonable to

lc(strip_marks($old_fn)) eq lc(strip_marks($new_fn))

Theee problem of what is or is not a mark varies by locale,

*  Castilian doesn't think ~ is a mark; Portuguese does, and 
   so if you strip marks, you in Castilian count as the same

   two letters that it deems disinct, but in Portuguese, you
   incur no lasting harm.

*  Catalan doesn't think ¸ is a mark; French does. and so if you strip
   marks, you in Catalan count as the same two letters that it deems
   disinct, but in French or Portuguese, you incur no lasting harm.

*  Modern English (usually) decomposes æ into a+e, but OE/AS and
   Icelandic do not.

*  Moreover, Icelandic deems é and e to be completely
   different letters altogether.  If you strip marks, you 
   count as the same letters that that language does not.

   Similarly with ö, which is at the end of their alphabet,
   (like ø in some), and nowhere near o or ó.  BTW, 

Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Tom Christiansen
In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] 
   of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] 

 Tom Christiansen wrote:

  I believe database folks have been doing the same with character data, but
  I'm not up-to-date on the DB world, so maybe we have some metainfo about
  the locale to draw on there.  Tim?

 AFAIK, modern databases are all strongly typed at least to the point
 that the values you store in and fetch from them are each explicitly
 character data or binary data or numbers or what-have-you; and so,
 when you are dealing with a DBMS in terms of character data, it is
 explicitly specified somewhere (either locally for the data or
 globally/hardcoded for the DBMS) that each value of character data
 belongs to a particular character repertoire and text encoding, and so
 the DBMS knows what encoding etc the character data is in, or at least
 it treats it consistently based on what the user said it was when it
 input the data.

Oh, good then.  That's what I'd heard was happening, but wasn't sure since
I've steared clear of such beasties since before it was true.

I wish our filesystems worked that way.  But Andrew said something to me
last week about Ken and Dennis writing quite pointedly that while you
*could* use the f/s as a database, that you *shouldn't*.  I didn't know
the reference he was thinking of, so just nodded pensively (=thoughtfully).

  There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
  string should test equal, and when, nor how to order them, without
  knowing the locale:
  
  RESUME,
  Resume
  resume
  Resum\x{e9}
  r\x{E9}sum\x{E9}
  r\x{E9}sume\x{301}
  Re\x{301}sume\x{301}

  Case insensitively, in Spanish they should be identical in all regards.
  In French, they should be identical but for ties, in which case you
  work your way right to left on the diactricals.

 This leads me to talk about my main point about sensitivity etc.

 I believe that the most important issues here, those having to do with
 identity, can be discussed and solved without unduly worrying about
 matters of collation;

It's funny you should say that, as I could nearly swear that I just showed
that identify cannot be determmined in the examples above without knowing
about locales.  To wit, while all of those sort somewhat differently, even
case-insensitively, no matter whether you're thinking of a French or a
Spanish ordering (and what is English's, anyway?), you have a a more
fundadmental = vs != scenario which is entirely locale-dependent.

If I can make a RESUME file, ought I be able to make a distcint
r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant
filesystem? There is no good answer, because we might think it
reasonable to

lc(strip_marks($old_fn)) eq lc(strip_marks($new_fn))

Theee problem of what is or is not a mark varies by locale,

*  Castilian doesn't think ~ is a mark; Portuguese does, and 
   so if you strip marks, you in Castilian count as the same
   two letters that it deems disinct, but in Portuguese, you
   incur no lasting harm.

*  Catalan doesn't think ¸ is a mark; French does. and so if you strip
   marks, you in Catalan count as the same two letters that it deems
   disinct, but in French or Portuguese, you incur no lasting harm.

*  Modern English (usually) decomposes æ into a+e, but OE/AS and
   Icelandic do not.

*  Moreover, Icelandic deems é and e to be completely
   different letters altogether.  If you strip marks, you 
   count as the same letters that that language does not.
   Similarly with ö, which is at the end of their alphabet,
   (like ø in some), and nowhere near o or ó.  BTW, those
   are three separate letters, not variants.

*  And in OE/AS you could have a long mark on an asc (say ash for the
   atomic *letter* æ).  If split into a and e and stripped of marks, it
   woudn't make any sense at all.

Case in point: Ælene Frisch, whom many of you doubtless know, insists her
name be spelt as I have written it.  She does not want Aelene Frish, for
she considers her forename to have 5 letters in it, not 6.  But Unicode
doesn't give us a title case version of that (did AS?), suggesting it a
ligature not a digraph.  

But if we have a file called ÆLENE, may be assume it the same in a case-
insensitive sense to both aelene and  ælene?

I can only go on code-points, because I don't want to deal with ß and SS
and Ss.  Case-folding file systems are just begging for trouble, and I just
don't know what to do.  Think of the 3 Greek sigmata.

 identity is a lot more important than collation, as well as a
 precondition for collation, and collation is a lot more difficult and can
 be put off.

I agree everything with everthing save and can be put off.  I would like
you to be right.  I should truly wish to be mistaken.  And I don't know
what we have for prior (cough) art.

 respect to dealing with a file system, generally 

Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Daniel Ruoso
Hi,

First of all, sorry for breaking the thread, but I had some trouble with
my mail provider, and couldn't hit the reply button.

To the point...

I think there are some things that are simply not solved by abstraction.
Some problems are concrete problems that need concrete solutions,
filesystem access is one of them, IMNSHO.

I pretty much think 

if ($*OS ~~ POSIX) { ... }
elsif ($*OS ~~ Win32) { ... }

is much saner than trying to deal with an enormous API that would be the
result of the attempt to get a sane abstraction of all the different
possible scenarios, and that would end up having backward-incompatible
changes after a while because of some use case scenario that wasn't
adrressed.

On the other hand, we really could think on having chmod, chown etc in
the POSIX module, and have the POSIX module imported (where chmod would
be in the default exports) by the prelude when in a posix machine, the
same for the Win32 or whatever counterpart.

Of course it would be very much interesting to have the open
implemented by the POSIX module with the same API as the open
implemented by the Win32 module. But I'm pretty much sure that's not the
case for chown and chmod, and I don't think an abstract API is worth the
trouble for 99% of the cases.

But note that this doesn't stop the people in the 1% case to write the
abstraction API, I just think it doesn't need to be the only way to
access the features, and it certainly doesn't need to be loaded in the
prelude.

daniel



Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Darren Duncan

Tom Christiansen wrote:
In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] 

 There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
 string should test equal, and when, nor how to order them, without
 knowing the locale:
 
 RESUME,

 Resume
 resume
 Resum\x{e9}
 r\x{E9}sum\x{E9}
 r\x{E9}sume\x{301}
 Re\x{301}sume\x{301}



I believe that the most important issues here, those having to do with
identity, can be discussed and solved without unduly worrying about
matters of collation;


It's funny you should say that, as I could nearly swear that I just showed
that identify cannot be determmined in the examples above without knowing
about locales.  To wit, while all of those sort somewhat differently, even
case-insensitively, no matter whether you're thinking of a French or a
Spanish ordering (and what is English's, anyway?), you have a a more
fundadmental = vs != scenario which is entirely locale-dependent.


If your current abstraction level is the Unicode codepoint level, then no 
knowledge of locale is needed at all in an everything-sensitive filesystem. 
Those 7 examples are all distinct for you, end of story.  So you can see why I 
advocate everything-sensitive as being the normal case, same as with Perl 
identifiers.


Rather than thinking of locales in terms of something special, AFAIK any locale 
can be reduced to a simple (though possibly verbose but predefinable in a 
library) normalized portable definition built from everything-sensitive 
components where the components are enumerations and functions describing a 
character repertoire (what characters can exist) plus representation 
normalization rules plus where applicable collation (ordering) rules plus where 
applicable mutual exclusion rules.


When your core toolkit just works with everything-sensitive components and 
insensitive or locale issues are just defined as formulae over that, then we 
have indeed separated the locale issues into a connected but non-core problem.



So collation doesn't need to be considered in Perl's file-system
interface, while identity does; collation can be a layer on top of the
core interface that just cares about identity.


That seems a simplified version of reality.  Identity isn't what monoglots
think it is.


I'm wondering if we're talking about the same meaning of the word collation. 
The way I have been using it, or meaning to, collation simply talks about how 
you put a set of values in order such that each 2 distinct values has a 
before|after relationship.  Whereas identity is testing whether 2 things you 
hold are just the same value or not.  You don't need to have ordering rules 
defined in order to have known equality rules.



If you *know* that the 7 strings are all UTF-8, then locale doesn't have
to be considered for equality; just your unicode abstraction level
matters, such as if you're defining the values in terms of graphemes vs
codepoints vs bytes.


That's not true.  é is not the same letter as e in Icelandic.


I don't consider those to be the same character period.  Mind you everywhere 
I've said graphemes I meant language-independent graphemes.


I grant you that if you get into a further abstraction level of 
language-dependent graphemes, then some may see those 2 characters as being 
identical, and if that's your point then I can better understand now where 
you're coming from with the problems you raise.


Practically speaking, I think that portability and other concerns would require 
us to just not go higher than the language-independent grapheme abstraction 
level when dealing with either Perl identifiers or file names or other urls with 
non-platform-specific APIs, and simply treat every language-independent grapheme 
as being distinct/non-identical from every other one, even if some locales might 
do different.  Users should be able to deal with this gracefully enough much as 
people can easily enough treat E and e as being distinct.


-- Darren Duncan


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Rafael Garcia-Suarez
Richard Hainsworth wrote in perl.perl6.language :
 The S16: chown, chmod thread seems to be too unix-focussed.

I was more or less thinking that the syscall-related primitives,
like chown or chmod, could go in a POSIX namespace. Even in UNIX
land nowadays the situation can be much more complex than traditional
ownership and modes (a situation not entirely satisfactorily addressed
by Perl 5's filetest pragma).

 Following the general perl6 philosophy, perhaps too there should be an 
 abstract definition for the language that is core and additional 
 modules that are specific to operating systems. Thus when generic 
 software is distributed, it comes with an installer that determines the 
 operating system chooses whether to use IO::Unix, IO::Unix::Gnome, 
 IO::MS::WindowsXP, IO::MS::Vista, IO::Apple, etc.
 Maybe also IO::Internet::Http, IO::Internet::Ftp?

IO (streams) and rights are not naturally related. Maybe you're thinking
about filesystems and other content addressing schemes (like URLs). The
subject is more complex than it seems at first glance, because you can
have, for example, per-volume current working directories. It's quite
hard to design something that is abstract enough, but at the same time
not totally useless.


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Richard Hainsworth ([EMAIL PROTECTED]) [081126 08:21]:
 The S16: chown, chmod thread seems to be too unix-focussed.
 
 To be portable, the minimum assumptions need to be made about the 
 environment in which a program operates. Alternatively, the software 
 needs to be able to determine whether the environment it is operating in 
 meets a minimum set of conditions.
 ...
 Thus I would suggest that the perl6 specifications should be written in 
 an abstract way, one not related to a specific operating system and in a 
 way that can be adapted by an implementor to specific systems.

I fully agree with you: the way the design is going is making the same
mistakes of Perl5 again.  Where we were able to release the Perl5
syntax more and more when the design of Perl6 made more progress, so
should we do with the way we use modules.  S16 is not doing that.

Also Rafael's suggestion to focus on POSIX is not the way a nice
interface should work.  POSIX calls (and non-POSIX means) are
ways to implement the interface to the Operating System, which can
be different from the most practical interface on implementation level.
We should focus on OS abstraction.

For instance, if a file is represented in an object, then the most
friendly interface names would be like:
  $file-owner($user);
  my $user = $file-owner;
under the hood, we use chown and stat.

I really would like to see a standard object oriented view on the
system, which mainly autodetects the environment.  I am really
fed-up using File::Spec and Path::Class explicitly all the time.

Also, I get data from a CD which was written case-insensitive and then
copied to my Linux box.  It would be nice to be able to say: treat this
directory case insensitive (even when the implementation is slow)
Shared with Windows default behavioral interface.

So, I would like a radical change... trying to be as much general
(non UNIX specific) as possible:
   (sorry, my Perl6 syntax is still very sloppy)

   some global $*OS
   # may be different per parallel instance of the program
   # Maybe an OS function which returns $*OS

   my $dir = $*OS.dir($*PROGRAM.arg[0]);
   # above maybe hidden with a functional wrappers: dir $argv[0]

   $dir.case_sensitive(0);
   if $dir.entry('xyz').is_file {}

   my $f   = $dir.file('xyz');
   $f.owner($*OS.user);

   $*OS.system('ls | lpr');

   print $*OS.family;
   print $*OS.kernel_version;

   my $pid = $*OS.process.label;
   
We should also be aware that we design Perl6 for parallelism.  Do we
require all nodes to run the same OS (~version)?

Besides, I would really like to get a incremental growth path to do
things we cannot do yet.  Some things are currently difficult to realize
under UNIX/Linux because there is not kernel interface defined for it.
For instance, you do not know in which character-set the filename is;
that is file-system dependent.  So, we treat filenames as raw bytes.
This does cause dangers (a UTF-8 codepoint in the filename with a \x2F
('/') byte in it, for instance)  But as long as the OS cannot provide
the user with this information, we should still give the author a way
to specify it.

   $*OS.filesystem('/home', type = 'xfs', name_encoding = 'latin1'
, text_content_encoding = 'utf-8,bom', illegal_chars = /\x0
, case_sensitive = 1, max_path = 1024);

I have been working on such a module for Perl5 (which has a much wider
field than Path::Class) but (as many other of my projects) did not
complete it to a usable/publishable level (yet).

It is all NOT too difficult to implement (we do share this knowledge),
but the design of this needs to be free from historical mistakes.  That's
a challenge.
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Tim Bunce
On Wed, Nov 26, 2008 at 12:40:41PM +0100, Mark Overmeer wrote:

 We should focus on OS abstraction.

 [...] the design of this needs to be free from historical mistakes.

And avoid making too many new ones. There must be useful prior art around.

Java, for example, has a FileSystem abstraction java.nio.file.FileSystem
http://openjdk.java.net/projects/nio/javadoc/java/nio/file/FileSystem.html

which has been extended, based on leasons learnt, in the NIO.2 project
(JSR 203: More New I/O APIs for the JavaTM Platform (NIO.2)
APIs for filesystem access, scalable asynchronous I/O operations,
socket-channel binding and configuration, and multicast datagrams.)
which enables things like being able to transparently treat a zip file
as a filesystem:
http://blogs.sun.com/rajendrag/entry/zip_file_system_provider_implementation

See http://javanio.info/filearea/nioserver/WhatsNewNIO2.pdf

Tim.

p.s. I didn't know any of that when I started to write this look for
prior art email, but a little searching turned up these examples.
I'm sure there are more in other realms, but NIO.2 certainly looks like a
rich source of good ideas derived from a wide range of experience.


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Leon Timmermans
On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer [EMAIL PROTECTED] wrote:
 Also, I get data from a CD which was written case-insensitive and then
 copied to my Linux box.  It would be nice to be able to say: treat this
 directory case insensitive (even when the implementation is slow)
 Shared with Windows default behavioral interface.


That is a task for the operating system, not Perl. You're trying to
solve the problem at the wrong end here IMHO.

 For instance, you do not know in which character-set the filename is;
 that is file-system dependent.  So, we treat filenames as raw bytes.

On native file-system types (like ext3fs), character-set is not
file-system dependent but non-existent. It really is raw bytes.

 This does cause dangers (a UTF-8 codepoint in the filename with a \x2F
 ('/') byte in it, for instance)

A \x2F always means a '/'. UTF-8 was designed to be backwards
compatible like that.

Regards,

Leon Timmermans


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Leon Timmermans ([EMAIL PROTECTED]) [081126 15:43]:
 On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer [EMAIL PROTECTED] wrote:
 That is a task for the operating system, not Perl. You're trying to
 solve the problem at the wrong end here IMHO.

In my (and your) case, the operating system is not helping at all
and there is no chance in having that changed.  So...
My remark was just one example, and I can give many more, where I
would like to see more abstraction in the OS interface to avoid the
need for each user to re-invent the wheel of interoperability.

  For instance, you do not know in which character-set the filename is;
  that is file-system dependent.  So, we treat filenames as raw bytes.
 
 On native file-system types (like ext3fs), character-set is not
 file-system dependent but non-existent. It really is raw bytes.

Not on the presentation level to the user.  This makes it even more
horrifying.  It depends on the setting of an environment variable
of the actual user how the bytes of the filename are interpreted.

On the OS filesystem implementation you are probably correct (in
the UNIX/Linux case), but programs are used for end-user results.

  This does cause dangers (a UTF-8 codepoint in the filename with
  a \x2F ('/') byte in it, for instance)
 A \x2F always means a '/'. UTF-8 was designed to be backwards
 compatible like that.

Yes, you are right on this.  ASCII does not suffer from UTF-8, so my
example was flawed.  The second 128 does cause problems.  How can glob()
sort filenames, for instance?  UTF-16 names should not enter the Perl
program unless you are aware of it, because those can hurt badly.

Please comment on the big picture in the debate: there are all kinds
of OS dependent things I really would like to see hidden in a (large)
abstraction layer to simplify the development of portable scripts.
I don't say I know all the answers, but I do feel a lot of pain in
each module for CPAN the same thing again.
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Larry Wall
On Wed, Nov 26, 2008 at 11:21:58AM +0300, Richard Hainsworth wrote:
 The S16: chown, chmod thread seems to be too unix-focussed.

Indeed, what you are currently reading in S16 is mostly just lightly
edited copy-paste from P5 docs.  But the S16 draft is out in the pugs
repo for a reason--anyone and everyone on this thread should consider
it perfectly okay to take S16 in hand and refactor it mercilessly.
Any shortcuts we wish to install into the final Perl 6 can easily
be done at the last moment by the prelude aliasing common operations
into the core language.

Anyway, feel free to coordinate this here and/or on #perl6.  (Note
that Patrick is in the process of moving all the Synopses to the pugs
repo at some point soon, so the current S16 in pugs/docs/Perl6/Spec
is likely to have its name/location changed soon.)  If you need
a pugs commit bit, please ask in #perl6 on irc.freenode.net.

Larry


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Darren Duncan
I agree with the idea of making Perl 6's filesystem/etc interface more abstract, 
as previously discussed, and also that users should be able to choose between 
different levels of abstraction where that makes sense, either picking a more 
portable interface versus a more platform-specific one.


Following up on Tim Bunce's comment about looking at prior art, I also recommend 
looking at the SQLite DBMS, specifically its virtual file system layer; this one 
is designed to give you deterministic behaviour and introspection over a wide 
range of storage systems and attributes, both on PCs and on embedded devices, or 
hard disks versus flash or write once vs write many etc, where a lot of 
otherwise-assumptions are spelled out.  One relevant url is 
http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good 
urls are.


Mark Overmeer wrote:

   $dir.case_sensitive(0);

   $*OS.filesystem('/home', type = 'xfs', name_encoding = 'latin1'
, text_content_encoding = 'utf-8,bom', illegal_chars = /\x0
, case_sensitive = 1, max_path = 1024);


I understand that the above, concerning case-sensitivity, is just meant to be an 
example, but I want to explore that in more detail for a moment, as it reflects 
a common perception that only scratches the surface and needs to be fleshed out 
more.


To summarize, what we really want is something more generic than 
case-sensitivity, which is text normalization and text folding in general, as 
well as distinctly dealing with distinctness for representation versus 
distinctness for mutual exclusivity.


For example, one file system will represent your chosen case for a filename but 
it won't allow 2 files in the same directory whose filenames are non-distinct 
when uppercased; another file system in contrast would also represent a filename 
uppercased.  For another example, one file system will not distinguish between 
accents on letters while another would, and this is orthogonal to 
case-sensitivity.  Or for another, one might treat a run of whitespace as being 
equivalent to a single whitespace character, or whitespace characters are 
ignored entirely.


Also, the paradigm that is the most distinguishing (case-sensitive, 
accent-sensitive, whitespace-sensitive, etc) should be the default, and any 
boolean option to change an aspect of this should be named that a false value is 
more distinguishing and a true value is less distinguishing.  For example, a 
flag should be named ignores_case rather than case_sensitive; this also 
assumes that if named arguments are optional, then the common default value of a 
boolean-typed argument is false.  Naming something case_sensitive implies that 
sensitivity is special whereas sensitivity should be considered normal, and 
rather insensitivity should be considered special.


-- Darren Duncan


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Geoffrey Broadwell
On Wed, 2008-11-26 at 11:34 -0800, Darren Duncan wrote:
 I agree with the idea of making Perl 6's filesystem/etc interface more 
 abstract, 
 as previously discussed, and also that users should be able to choose between 
 different levels of abstraction where that makes sense, either picking a more 
 portable interface versus a more platform-specific one.

Agreed on both counts.

 Following up on Tim Bunce's comment about looking at prior art, I also 
 recommend 
 looking at the SQLite DBMS, specifically its virtual file system layer; this 
 one 
 is designed to give you deterministic behaviour and introspection over a wide 
 range of storage systems and attributes, both on PCs and on embedded devices, 
 or 
 hard disks versus flash or write once vs write many etc, where a lot of 
 otherwise-assumptions are spelled out.  One relevant url is 
 http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good 
 urls are.

There are also higher-level VFS systems, such as Icculus.org PhysicsFS,
which goes farther than just abstracting the OS operations.  It also
abstracts away the differences between archives and real directories,
unions multiple directory trees on top of each other, and transparently
redirects writes to a different trunk than reads:

http://icculus.org/physfs/

I want to be able to support that functionality in a way that still
allows me to open and close PhysicsFS files and directories the way
I would normally.  I want to be able to layer it *under* the standard
Perl IO ops, rather than above them.

The following is all obvious, but just to keep it in people's minds and
frame the discussion:

Being able to layer IO abstractions is at least as important as the
basic OS abstraction itself -- as well as the ability to use the high
level abstraction most of the time, but reach down the stack when
needed.  This implies making best effort to minimize the ways in which
upper layers will be hopelessly confused by low-level operations, and
documenting the heck out of the problem areas.

These layers should be mix-and-match as much as possible, with
abstractions designed with common interfaces.  Certainly Perl 5's IO
layers, as well as any networking or library stack, are prior art here.

 To summarize, what we really want is something more generic than 
 case-sensitivity, which is text normalization and text folding in general, as 
 well as distinctly dealing with distinctness for representation versus 
 distinctness for mutual exclusivity.

Yes, definitely.

 [This] implies that 
 sensitivity is special whereas sensitivity should be considered normal, and 
 rather insensitivity should be considered special.

If only that were true in other areas of life.  :-)


-'f




Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Leon Timmermans
On Wed, Nov 26, 2008 at 5:15 PM, Mark Overmeer [EMAIL PROTECTED] wrote:
 Yes, you are right on this.  ASCII does not suffer from UTF-8, so my
 example was flawed.  The second 128 does cause problems.  How can glob()
 sort filenames, for instance?

That's a matter of collation, not (just) character set. TIMTOWTDI.
There is no right way to do it as it depends on the circumstances, but
a simple binary sort is not a bad default.

Leon Timmermans


Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Timothy S. Nelson
	Can I just remind everyone that (IMO) we shouldn't just be considering 
filesystems here?  I think it would be a pretty useful feature to have a 
general tree manipulation interface, and then this could be applied to 
filesystems, or XML, or LDAP, or SQL (although this doesn't map so well), or 
whatever.


I guess the way I see it, you'd have something like this:

role Tree::Node {...}
role Filesystem::Node inherits from Tree::Node {...}
role Filesystem::Directory inherits from Filesystem::Node {...}
class Filesystem::File does Filesystem::Node { # Interface, like DBI
has $implementation handles *;

$implementation = Filesystem::File::XML-new();
}
class Filesystem::File::XML inherits from Filesystem::File::Base {...}

	In the case of Filesystem::Node, you would define some standard 
attribute names (eg. owner, is_readable), but then they would be 
accessible through the standard Tree::Node.get_attribute() interface.  And the 
standard Tree::Node.get_children() would be implemented by Filesystem::File as 
something to fetch the contents of the file; in the case of 
Filesystem::XMLFile, it would turn the contents into a tree of XML nodes.


	I agree about the different levels of abstractions, but just wanted to 
put in a plug for this one as one that I like.


:)


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: [EMAIL PROTECTED]| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y-

-END GEEK CODE BLOCK-



Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Darren Duncan

Tom Christiansen wrote:

I believe database folks have been doing the same with character data, but
I'm not up-to-date on the DB world, so maybe we have some metainfo about
the locale to draw on there.  Tim?


AFAIK, modern databases are all strongly typed at least to the point that the 
values you store in and fetch from them are each explicitly character data or 
binary data or numbers or what-have-you; and so, when you are dealing with a 
DBMS in terms of character data, it is explicitly specified somewhere (either 
locally for the data or globally/hardcoded for the DBMS) that each value of 
character data belongs to a particular character repertoire and text encoding, 
and so the DBMS knows what encoding etc the character data is in, or at least it 
treats it consistently based on what the user said it was when it input the 
data.  The only time this information isn't really remembered is if the data is 
supplied in terms of being binary data.


Maybe some older or unusual DBMSs aren't this way, and of course technically a 
filesystem etc *is* a database ... I think that example mentioned about filename 
storage being locale dependent, probably meant that at the actual filesystem 
level it was just dealing with the names as binary data.



There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
string should test equal, and when, nor how to order them, without
knowing the locale:

RESUME,
Resume
resume
Resum\x{e9}
r\x{E9}sum\x{E9}
r\x{E9}sume\x{301}
Re\x{301}sume\x{301}

Case insensitively, in Spanish they should be identical in all
regards.  In French, they should be identical but for ties, 
in which case you work your way right to left on the diactricals.


This leads me to talk about my main point about sensitivity etc.

I believe that the most important issues here, those having to do with identity, 
can be discussed and solved without unduly worrying about matters of collation; 
identity is a lot more important than collation, as well as a precondition for 
collation, and collation is a lot more difficult and can be put off.  With 
respect to dealing with a file system, generally it is just identity that 
matters and collation is a concern that can typically be just tacked on after 
identity is solved.


That is, with a file system you need to know whether or not a file name you hold 
will or won't match a file in the system, and matching or not-matching is the 
main function of an identity.  Similarly, the file system has to make sure that 
no 2 distinct files in it have the same file name, that is the same public 
identity.  In contrast, the order that you order or sort a list of files by 
their names usually isn't so important; while all work with a file system 
requires working with identities, most work does not need to deal with 
collation.  In practice several parties can agree on a single means of 
identifying files, while still having their own favorite collations, so the same 
list can be ordered in different ways.


Collation criteria is something that can be naturally applied externally to a 
file system, such as by a user program, and only identity criteria needs to be 
built-in to the file system.


So collation doesn't need to be considered in Perl's file-system interface, 
while identity does; collation can be a layer on top of the core interface that 
just cares about identity.


One maxim I apply in my database work, and that I believe applies to this 
discussion, is any logical difference is a big difference.  If you have 2 
distinct value literals such that you consider the difference in each literal's 
spelling to be significant, such that you can't for all use cases substitute one 
literal for the other, then the 2 literals denote 2 distinct values; in the 
other case, where you can always substitute one for the other harmlessly, then 
they denote the same value.  The concept of 'value' and 'identity' are the same, 
and any value is its own identity.


And so, with your 7 'resume' literals, I would say that if there is a reason for 
any of the spellings to exist that couldn't be handled by one of the other 
spellings, then all 7 literals are distinct/non-identical taken as-is.


If you *know* that the 7 strings are all UTF-8, then locale doesn't have to be 
considered for equality; just your unicode abstraction level matters, such as if 
you're defining the values in terms of graphemes vs codepoints vs bytes.


When talking about identity, there is no such thing as case-insensitivity or 
accent insensitivity or whitespace insensitivity or what have you.  If you have 
any reason to not replace every E with an e or vice-versa in your character 
string, then you consider those 2 non-identical and so they wouldn't match; by 
contrast, true case-insensitivity means you can replace every e with an E 
(for example) and forget than an e ever existed; the actual equality test is 
then the same since all comparands would only have the E.


And so 

Re: Files, Directories, Resources, Operating Systems

2008-11-26 Thread Mark Overmeer
* Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]:
 On Wed, 26 Nov 2008 11:18:01 PST.--or, for backwards compatibility,
 at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC,
 Larry Wall [EMAIL PROTECTED] wrote:
 
 SUMMARY: I've been looking into this sort of thing lately (see p5p),
  and there may not even *be* **a** right answer.  The reasons
why take us into an area we've traditionally avoided.

What a long message...

 Mark We should focus on OS abstraction.
 Mark [...] the design of this needs to be free from historical mistakes.

  ... It cannot be
 done in an automated fashion, since you can't know a filesystem that knew
 *locale* each filename was created under, and  without that, you have to
 guess--almost always wrongly.

Exactly.  This is an historical mistake, understandable to have at least
a path of growth from the current system open() interface.  Only users
which have the same locale can see the names the same.  If you change
your locale your filenames break!  Say you change from cyrillic into
English.

In my suggestion, the programmer (who is ofter local on the system) can
at least say what the locale was when the filenames where created.  On
some OS, that OS can tell you.  What I would like is an object model
which does allow us at least to abstract these problems away... whether
it can be resolved automatically or only with help is for later.

 There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
 string should test equal, and when, nor how to order them, without
 knowing the locale:
 
 RESUME,
 Resume
 resume
 Resum\x{e9}
 r\x{E9}sum\x{E9}
 r\x{E9}sume\x{301}
 Re\x{301}sume\x{301}

This is done by the locale of the user of the script, as usual for
ls(1).  So, I do not see your problem here.

I don't mind if problems with unicode are not solved or solvable.
Could be discuss about a buildin File::Spec/Path::Class?  And we
allow us the same limitations as these have, for the moment.
-- 
Regards,

   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net