Re: Files, Directories, Resources, Operating Systems
* Charles Bailey [EMAIL PROTECTED] [2008-12-10 03:15]: It may well be that a fine-grained interface isn't practical, but perhaps there are some basics that we could implement, such as - set owner of this thing - (maybe) set group of this thing - give owner|everyone|?some-group the ability to read from|write to|remove|run this thing - tell me whether any of these is possible - make the metadata for this thing the same as the metadata for that thing - tell me when this thing was created|last updated There are many problematic suggestions here. Some examples: • Unix does not track file creation datetime at all. • The concept of making a file runnable doesn’t even exist on Windows: that property is derived from the filename extension. • Delete permission on a file is a concept that doesn’t exist on Unix. To be able to delete a file, you instead need write permission on the directory it resides in. Furthermore, in Win32, files and directories can inherit permissions, so the fact that a file has certain effective permissions does not mean that these permissions are set on the file itself. But if you set them on the file itself, you dissociate it from the inheritance chain. So reading permissions and then setting them the same, without changing anything, can still have unwanted side effects. Or if you try to make the API smart, and so make it set permissions only when they constitute a change from the effective permissions, then conversely the user no longer has a way to dissociate the file from iheritance if that *is* what they wanted. So the concept of inheritance must be exposed explicitly. This is the primary issue I was thinking of when I said that some differences between Win32 and Unix have such pervasive effects that it seems impossible to provide even a rudimentary abstract interface. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
I''ve been playing with similar sorts of problems when creating an OO model for packaging metadata, that could supposedly represent the data from a .rpm or a .deb or whatever. The first thing I did was set up a method where if we're outputting eg. an RPM, it will mark every piece of metadata it uses, and then afterwards, the core system will emit warnings about all the things it didn't use. Something similar could possibly be done; we'd simply need to give the user control as to where the warnings end up. Note that I also agree with the guy who said that we need system-specific calls, and then an abstraction layer on top of that. On Wed, 10 Dec 2008, Aristotle Pagaltzis wrote: * Charles Bailey [EMAIL PROTECTED] [2008-12-10 03:15]: It may well be that a fine-grained interface isn't practical, but perhaps there are some basics that we could implement, such as - set owner of this thing - (maybe) set group of this thing - give owner|everyone|?some-group the ability to read from|write to|remove|run this thing - tell me whether any of these is possible - make the metadata for this thing the same as the metadata for that thing - tell me when this thing was created|last updated There are many problematic suggestions here. Some examples: ? Unix does not track file creation datetime at all. Emit a warning. ? The concept of making a file runnable doesn?t even exist on Windows: that property is derived from the filename extension. So when they read it, make a guess based on the extension, and when they write it, emit an error. ? Delete permission on a file is a concept that doesn?t exist on Unix. To be able to delete a file, you instead need write permission on the directory it resides in. So when they read it, figure it out, and when they write it, emit an error. Furthermore, in Win32, files and directories can inherit permissions, so the fact that a file has certain effective permissions does not mean that these permissions are set on the file itself. But if you set them on the file itself, you dissociate it from the inheritance chain. So reading permissions and then setting them the same, without changing anything, can still have unwanted side effects. Or if you try to make the API smart, and so make it set permissions only when they constitute a change from the effective permissions, then conversely the user no longer has a way to dissociate the file from iheritance if that *is* what they wanted. So the concept of inheritance must be exposed explicitly. Or, you could pick a consistent model, and then let the user use the lower-level interface if they want to be more specific. This is the primary issue I was thinking of when I said that some differences between Win32 and Unix have such pervasive effects that it seems impossible to provide even a rudimentary abstract interface. Try rudimentary *optional* abstract interface, where the other option is system-specific. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: [EMAIL PROTECTED]| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Files, Directories, Resources, Operating Systems
* Mark Overmeer [EMAIL PROTECTED] [2008-12-08 21:20]: A pitty that we do not focus on the general concept of OS abstraction (knowing that some problems are only partially solvable (on the moment)). Well go on. Explain how you would, f.ex., provide an abstract API over file ownership and access permissions between Win32 and Unix? I don’t see such a thing being possible at all: there are too many differences with pervasive consequences. The most you can reasonably do (AFAICT) is map Win32-style owner/access info to a Unix-style API for reading only. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
* Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]: Well go on. Btw, I just realised that it can be read as sarcastic, which I didn’t intend. I am honestly curious, even if skeptical. I am biased, but I am open to be convinced. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
On 2008 Dec 9, at 19:56, Aristotle Pagaltzis wrote: * Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]: Well go on. Btw, I just realised that it can be read as sarcastic, which I didn’t intend. I am honestly curious, even if skeptical. I am biased, but I am open to be convinced. BTW you can run into this issue even only considering Unix/POSIX: POSIX ACLs, AFS, NFSv4. I can see the point of a very simple base API with system-dependent extensions, but am likewise skeptical that one can be designed that isn't useless. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH
Re: Files, Directories, Resources, Operating Systems
It may well be that a fine-grained interface isn't practical, but perhaps there are some basics that we could implement, such as - set owner of this thing - (maybe) set group of this thing - give owner|everyone|?some-group the ability to read from|write to|remove|run this thing - tell me whether any of these is possible - make the metadata for this thing the same as the metadata for that thing - tell me when this thing was created|last updated in addition to the usual CRUD operations. More detailed views of metadata might be the providence of OS-specific modules, as might different semantics for content (and even stringy metadata). But having this sort of simplified works-everywhere layer interposed should handle common tasks like reading, writing, and copying without making everyone replicate OS-specific variants. The basic operations above have a POSIXy flavor, but the underlying details shouldn't. For instance, allow me to read and write this thing != chmod 6xx, thing. I'm not saying this is an easy solution, just that it's worth the effort. Then again, I think File::Copy is a better choice than Csystem cp for publicly distributed code, so I'm already biased. -- Regards, Charles Bailey On 12/9/08, Brandon S. Allbery KF8NH [EMAIL PROTECTED] wrote: On 2008 Dec 9, at 19:56, Aristotle Pagaltzis wrote: * Aristotle Pagaltzis [EMAIL PROTECTED] [2008-12-10 01:10]: Well go on. Btw, I just realised that it can be read as sarcastic, which I didn't intend. I am honestly curious, even if skeptical. I am biased, but I am open to be convinced. BTW you can run into this issue even only considering Unix/POSIX: POSIX ACLs, AFS, NFSv4. I can see the point of a very simple base API with system-dependent extensions, but am likewise skeptical that one can be designed that isn't useless. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH -- Regards, Charles Bailey Lists: bailey _dot_ charles _at_ gmail _dot_ com Other: bailey _at_ newman _dot_ upenn _dot_ edu
Re: Files, Directories, Resources, Operating Systems
On 2008 Dec 9, at 21:11, Charles Bailey wrote: It may well be that a fine-grained interface isn't practical, but perhaps there are some basics that we could implement, such as - set owner of this thing - (maybe) set group of this thing Group is problematic; I don't recall Windows having group ownership (as distinct from group ACLs), and AFS PTS groups are very different from Unix groups. As I said, I'm all in favor of such an API, just skeptical that a useful one can be devised. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH
Re: Files, Directories, Resources, Operating Systems
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081210 00:06]: * Mark Overmeer [EMAIL PROTECTED] [2008-12-08 21:20]: A pitty that we do not focus on the general concept of OS abstraction (knowing that some problems are only partially solvable (on the moment)). Well go on. Explain how you would, f.ex., provide an abstract API over file ownership and access permissions between Win32 and Unix? I don’t see such a thing being possible at all: there are too many differences with pervasive consequences. The most you can reasonably do (AFAICT) is map Win32-style owner/access info to a Unix-style API for reading only. (I do not have time today for long emails... paying work to do :-( The short answer: Just like Path::Class or IO::File, I suggest an OO interface. That means that you may share methods between different OSes but it also may not be possible. Within this OO interface, you could design two abstraction levels: one which maps directly to the OS calls, like supports chown via some POSIX mix-in. On an other level, we attempt to unify environments. For the latter, you can think of methods like owner getter and setter, os_family or size. Even more to my likings is an additional super-level. In this case, the actual platform-dependent implementation does its best... Maybe something like: (still Perl5 style) $file-change_attributes(owner = $user, group = $group, readable = 1, ...); The core implementation tries as good and as bad as it goes to unify various kinds of attributes onto OS specific features, taking care of nastiness like change-order limitations. Typically becoming smarter over time. Real DWIMming, exploiting our joint knowledge and share this. -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
* Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]: So why are you all so hessitating in making each other's life easier? There is no 100% solution, but 0% is even worse! It looks like Python 3000 just tried that. People are not happy about it: http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
On Mon, Dec 8, 2008 at 8:16 PM, Aristotle Pagaltzis [EMAIL PROTECTED] wrote: It looks like Python 3000 just tried that. People are not happy about it: http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem Yeeh, I also noted exactly that problem when reading the What's New In Python 3.0. What were they thinking?! Leon
Re: Files, Directories, Resources, Operating Systems
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081208 19:16]: * Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]: So why are you all so hessitating in making each other's life easier? There is no 100% solution, but 0% is even worse! It looks like Python 3000 just tried that. People are not happy about it: http://utcc.utoronto.ca/~cks/space/blog/python/OsListdirProblem I thought we were having a serious discussion. We all know that considering all names as Unicode is a stupid presumption. It seems that some bright minds got stuck in a deep recursion about codesets in file- and directory names. A pitty that we do not focus on the general concept of OS abstraction (knowing that some problems are only partially solvable (on the moment)). -- MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 16:57]: * Mark Overmeer [EMAIL PROTECTED] [2008-12-04 16:50]: * Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]: Furthermore, from the point of view of the OS, even treating file names as opaque binary blobs is actually fine! Programs don’t care after all. In fact, no problem shows up until the point where you try to show filenames to a user; that is when the headaches start, not any sooner. So, they start when - you have users pick filenames (with Tk) for a graphical applications. You have to know the right codeset to be able to display them correctly. Yes, but you can afford imperfection because presumably you know which displayed filename corresponds to which stored octet sequence, so even if the name displays incorrectly, you still operate on the right file if the user picks it. With all these different encodings, it is easy to show filenames which are not a little-bit incorrect, but which are unrecognizably corrupted. In the whole debate, it look like there are only two groups of developers involved: the programming language authors and the end-application developers. But do not forget that there are also CPAN library authors and maintainers (my main involvement) When you create a good library, you have to support multiple (unpredicatable) platformas and languages. Each time you say: oh, just let the end-user figure that out, you add complexity and distribute implementation horrors. Good, generally available libraries are crucial for any language. - you have XML-files with meta-data on files which are being distributed. (I have a lot of those) Use URI encoding unless you like a world of pain. You are looking at it from the wrong point of view: Perl is used as a glue language: other people determine what kind of data we have to process. So, also in my case, the content of these XML structures is totally out of my hands: no influence on the definitions at all. I think that is the more common situation. NTFS seems to say it’s all Unicode and comes back as either CP1252 or UTF-16 depending on which API you use, so I guess you could auto-decode those. But FAT is codepage-dependent, and I don’t know if Windows has a good way of distinguishing when you are getting what. So Windows seems marginally more consistent than Unix, but possibly only apparently. (What happens if you zip a file with random binary garbage for a name on Unix and then unzip it on Windows?) I have no idea what other systems do. Well, the nice thing about File::Spec/Class::Path is that someone did know how those systems work and everyone can benefit from it. So why are you all so hessitating in making each other's life easier? There is no 100% solution, but 0% is even worse! Once upon a time, Perl people where eager for good DWIMming and powerful programming. Nowadays, I see so much fear in our community to attempt simpler/better/other ways of programming. We get a brand new language, with a horribly outdated documentation system and very traditional OS approach. As if everyone prefers to stick to Perl's 22 years and Unixes 39 years old choices, where the world around us saw huge development and change in needs. Are we just getting old, grumpy and tired? Where is the new blood to stir us up? - MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
* Mark Overmeer [EMAIL PROTECTED] [2008-12-07 14:20]: - you have XML-files with meta-data on files which are being distributed. (I have a lot of those) Use URI encoding unless you like a world of pain. You are looking at it from the wrong point of view: Perl is used as a glue language: other people determine what kind of data we have to process. So, also in my case, the content of these XML structures is totally out of my hands: no influence on the definitions at all. I think that is the more common situation. If you start with a broken data format, no amount of papering over it will unbreak it. Sorry, Perl 6 won’t have magic ponies to fix that. Ambiguous data cannot be disambiguated by smart code. If you want to try anyway, talk to someone who didn’t get their name on an IETF RFC out of disgust with the state of an unfixably messy legacy data format. NTFS seems to say it’s all Unicode and comes back as either CP1252 or UTF-16 depending on which API you use, so I guess you could auto-decode those. But FAT is codepage-dependent, and I don’t know if Windows has a good way of distinguishing when you are getting what. So Windows seems marginally more consistent than Unix, but possibly only apparently. (What happens if you zip a file with random binary garbage for a name on Unix and then unzip it on Windows?) I have no idea what other systems do. Well, the nice thing about File::Spec/Class::Path is that someone did know how those systems work and everyone can benefit from it. These modules are completely and utterly oblivious to encoding issues, so I have no idea how they are relevant in the first place. So why are you all so hessitating in making each other's life easier? There is no 100% solution, but 0% is even worse! Because I have seen Java, and it taught me that the 90% solution is worse than the 20% solution. Provide 20% in the language and someone will use that and write Path::Class. And if we abstain from putting today’s best solutions in the core library, then we have a chance that tomorrow’s best solutions might gain traction. (Otherwise we get 10 years of CGI.pm again.) Once upon a time, Perl people where eager for good DWIMming and powerful programming. And yet it’s the CPAN that turned out to be Perl’s greatest strength. If you suggested the initial concept of the CPAN today, people would laugh at you – it would seem like an April fool’s joke. It didn’t even have a standard package format! Nowadays, I see so much fear in our community to attempt simpler/better/other ways of programming. Simpler in what way? All abstractions leak. Take this into account or make users suffer. We get a brand new language, with a horribly outdated documentation system and very traditional OS approach. As if everyone prefers to stick to Perl's 22 years and Unixes 39 years old choices, where the world around us saw huge development and change in needs. If you can show me a ubiquitous kernel that runs perl and was designed less than 15 years ago, I’ll show you a modern OS API approach. If you want to see an attempt at an abstract interface layered over crusty OS designs, I’ll show you Java. Abstaining from the attractive nuisance of abstracting small- seeming differences away seems to have worked out well enough for DBI, anyway. Would you argue that DBI is not a good or relevant example? (And if so, why?) Or are you suggesting that approach was a failure or horrible in some way? Are we just getting old, grumpy and tired? Where is the new blood to stir us up? Busy designing their own second system. You want to invite a bunch of PHP kids? I’m game. :-) Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
* Tom Christiansen [EMAIL PROTECTED] [2008-11-27 11:30]: In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] I believe that the most important issues here, those having to do with identity, can be discussed and solved without unduly worrying about matters of collation; It's funny you should say that, as I could nearly swear that I just showed that identify cannot be determmined in the examples above without knowing about locales. To wit, while all of those sort somewhat differently, even case-insensitively, no matter whether you're thinking of a French or a Spanish ordering (and what is English's, anyway?), you have a a more fundadmental = vs != scenario which is entirely locale-dependent. If I can make a RESUME file, ought I be able to make a distcint r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant filesystem? That’s for the file system to know, not Perl 6. Trying to unify this in any way on the side of Perl is, in my regard, a fool’s errand. If the file system is case insensitive, then it will make the call in whatever way it deems correct, and it’s not for us to worry about all the possible ways in which all possible current and future file systems might answer such questions. Furthermore, from the point of view of the OS, even treating file names as opaque binary blobs is actually fine! Programs don’t care after all. In fact, no problem shows up until the point where you try to show filenames to a user; that is when the headaches start, not any sooner. To that, the right solution is simply not to roundtrip filenames through the user interface; instead, keep both the original octet sequence as well as the decoded version, and use the decoded version in UI but refer back to the pristine original when the user elects, via UI, to operate on that file. As far as I am concerned, if Perl 6 has a distinction between octet strings and character strings, then all that’s required is to have filenames returned from OS APIs come back as octet strings, keeping the programmer from forgetting to deal with decoding issues. The higher-level problems like sorting names in a locale-aware fashion will be solved by the CPAN collective much better than any boil-the-ocean abstract interface design that the Perl 6 cabal would produce – if indeed these are real problems at all in practice. All that’s necessary is to design the interface such that it won’t obstruct subsequent “userland” solution approaches. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Files, Directories, Resources, Operating Systems
* Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]: Furthermore, from the point of view of the OS, even treating file names as opaque binary blobs is actually fine! Programs don’t care after all. In fact, no problem shows up until the point where you try to show filenames to a user; that is when the headaches start, not any sooner. So, they start when - you have users pick filenames (with Tk) for a graphical applications. You have to know the right codeset to be able to display them correctly. - you have XML-files with meta-data on files which are being distributed. (I have a lot of those) - when you start doing path manipulation on (UTF-16) blobs, and so forth. I have been fighting these problems for a long time, and they worry me more and more because we see Unicode being introduced on the OS-level. The mess is growing by the day. To that, the right solution is simply nt to roundtrip filenames through the user interface; instead, keep both the original octet sequence as well as the decoded version, and use the decoded version in UI but refer back to the pristine original when the user elects, via UI, to operate on that file. But now you simply say decode it. But to be able to decode it, you must known in which charset it is in the first place. So: where do we start guessing? An educated guess at OS level, or on each user program again? decoding issues. The higher-level problems like sorting names in a locale-aware fashion will be solved by the CPAN collective much better than any boil-the-ocean abstract interface design that the Perl 6 cabal would produce – if indeed these are real problems at all in practice. Why? Are CPAN programmers smarter than Perl6 Cabal people? What I whould like to be designed is an object model for OS, processes directories, and files. We will not be able to solve all problems for each OS. Maybe people need to install additional CPAN modules to get smarter behavior. But I would really welcome it if platform independent coding is the default behavior, without need for File::Spec, Class::Path and such. Once, we have made the step from FILEHANDLES to IO::File. Let's make it go a little further. The discussion is stuck in filenames, which are a problematic area. But we started with chown and friends. It really would like to be able to write: $file = File.new($filename); $file.owner($user); if $file.owner eq $user {} $file.open() over if($has_POSIX) { chown $filename, $user; if((stat $filename)[4]==getpwuid $user) {} } else { die Sorry, do not understand your system; } -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
* Mark Overmeer [EMAIL PROTECTED] [2008-12-04 16:50]: * Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]: Furthermore, from the point of view of the OS, even treating file names as opaque binary blobs is actually fine! Programs don’t care after all. In fact, no problem shows up until the point where you try to show filenames to a user; that is when the headaches start, not any sooner. So, they start when - you have users pick filenames (with Tk) for a graphical applications. You have to know the right codeset to be able to display them correctly. Yes, but you can afford imperfection because presumably you know which displayed filename corresponds to which stored octet sequence, so even if the name displays incorrectly, you still operate on the right file if the user picks it. - you have XML-files with meta-data on files which are being distributed. (I have a lot of those) Use URI encoding unless you like a world of pain. - when you start doing path manipulation on (UTF-16) blobs, and so forth. I have been fighting these problems for a long time, and they worry me more and more because we see Unicode being introduced on the OS-level. The mess is growing by the day. And all we can do is to avoid making it even bigger. Because the only ones in control here are the OS vendors, and they aren’t solving it, only making it bigger. The only thing *we* can do is not to erect obstacles that users will have to work around when our abstractions invariably leak. I am unconvinced that this problem actually yields to abstraction. All the really hard problems in computing are the ones that intersect with human culture – text in any form, and dates and times. When computers deal with mathematical entities, few problems are even hard, let alone insurmountable, you only need to work at them long enough. Human concepts are not like that, they are messy and inconsistent. To that, the right solution is simply nt to roundtrip filenames through the user interface; instead, keep both the original octet sequence as well as the decoded version, and use the decoded version in UI but refer back to the pristine original when the user elects, via UI, to operate on that file. But now you simply say decode it. But to be able to decode it, you must known in which charset it is in the first place. So: where do we start guessing? An educated guess at OS level, or on each user program again? I am not advocating educated guesses. The mechanism would be whatever interfaces the system provides. Unix does not have any, so you can indeed only ever guess, but if they system can give you something better, that should be used. NTFS seems to say it’s all Unicode and comes back as either CP1252 or UTF-16 depending on which API you use, so I guess you could auto-decode those. But FAT is codepage-dependent, and I don’t know if Windows has a good way of distinguishing when you are getting what. So Windows seems marginally more consistent than Unix, but possibly only apparently. (What happens if you zip a file with random binary garbage for a name on Unix and then unzip it on Windows?) I have no idea what other systems do. But there is no common denominator, so pretending there is one is not going to help. The higher-level problems like sorting names in a locale-aware fashion will be solved by the CPAN collective much better than any boil-the-ocean abstract interface design that the Perl 6 cabal would produce – if indeed these are real problems at all in practice. Why? Are CPAN programmers smarter than Perl6 Cabal people? Of course! There are many more CPAN programmers than cabalists; some of them are bound to have much greater expertise in some relevant area of this problem than anyone in the cabal. Even those who aren’t that smart will have direct access to and specific knowledge of the system they are dealing with, that the cabal may never even hear about. What I whould like to be designed is an object model for OS, processes directories, and files. We will not be able to solve all problems for each OS. Maybe people need to install additional CPAN modules to get smarter behavior. But I would really welcome it if platform independent coding is the default behavior, without need for File::Spec, Class::Path and such. Ugh. I understand the desire, but it is very easy to get into architecture astronautics. I think we should follow the DBI approach and not try to provide a unified interface to system- specific things like permissions and ownership: unify the most general notions of filesystems but leave all the specifics to be dealt with by user code in the concrete. That is the only place where the amount of acceptable abstraction can be decided. Cf. writing apps that run on all of PostgreSQL, MySQL and Oracle vs those that take advantage of specific DBMS features: this is a decision that the programmer has to make, it is not one we can make on his behalf. Regards,
Re: Files, Directories, Resources, Operating Systems
In-Reply-To: Message from Mark Overmeer [EMAIL PROTECTED] of Thu, 27 Nov 2008 08:23:50 +0100. [EMAIL PROTECTED] * Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]: On Wed, 26 Nov 2008 11:18:01 PST.--or, for backwards compatibility, at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC, Larry Wall [EMAIL PROTECTED] wrote: SUMMARY: I've been looking into this sort of thing lately (see p5p), and there may not even *be* **a** right answer. The reasons why take us into an area we've traditionally avoided. What a long message... It *was*? That was approaching a medium in my epistolary (and RFC) world, the one unrelated to PostIt notes. I can therefore see you've never been FMTEYEWTK'd, and thus also to all outward appearances, we've not made each other's acquaintance. I'm tchrist; pleased to meet you. Read the //www.unicode.org/reports/tr10/ treatise, as I have repeatedly done, and you will quickly reassess your length calls. This is not necessarily a good thing. Neal Stephenson can do the same, and of far lesser utility. --tom
Re: Files, Directories, Resources, Operating Systems
Just as a variable name in perl6 must conform to a standard and abide by a set of constraints, why should file or other resource names be an exception? The constraints on variable names in perl6 are very flexible, but there are some rules that must be enforced for a program to work. It seems to me that resource (eg. file) names too should also be constrained so that software portability can be ensured. A reasonably constructed set of constraints for the perl6 core should deal with most locale/OS/character set considerations, and where a particular environment cannot cope, then a module will be needed to eigenmunge the names appropriately. Suppose for the sake of argument we state that resource names in perl6 shall comply with the rules for variable names; and the sort sequence of such names is the one defined for unicode strings. Where software in perl6 is written for a specific domain, eg. Catalan or Russian, the programmer will know more about the domain and how to deal with resource names in that locale. This would include sort sequences and the complexities Tom outlined. Such things would be relegated to OS / domain specific modules. Would this help? Tom Christiansen wrote: In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] Tom Christiansen wrote: I believe database folks have been doing the same with character data, but I'm not up-to-date on the DB world, so maybe we have some metainfo about the locale to draw on there. Tim? AFAIK, modern databases are all strongly typed at least to the point that the values you store in and fetch from them are each explicitly character data or binary data or numbers or what-have-you; and so, when you are dealing with a DBMS in terms of character data, it is explicitly specified somewhere (either locally for the data or globally/hardcoded for the DBMS) that each value of character data belongs to a particular character repertoire and text encoding, and so the DBMS knows what encoding etc the character data is in, or at least it treats it consistently based on what the user said it was when it input the data. Oh, good then. That's what I'd heard was happening, but wasn't sure since I've steared clear of such beasties since before it was true. I wish our filesystems worked that way. But Andrew said something to me last week about Ken and Dennis writing quite pointedly that while you *could* use the f/s as a database, that you *shouldn't*. I didn't know the reference he was thinking of, so just nodded pensively (=thoughtfully). There is ABSOLUTELY NO WAY I've found to tell whether these utf-8 string should test equal, and when, nor how to order them, without knowing the locale: RESUME, Resume resume Resum\x{e9} r\x{E9}sum\x{E9} r\x{E9}sume\x{301} Re\x{301}sume\x{301} Case insensitively, in Spanish they should be identical in all regards. In French, they should be identical but for ties, in which case you work your way right to left on the diactricals. This leads me to talk about my main point about sensitivity etc. I believe that the most important issues here, those having to do with identity, can be discussed and solved without unduly worrying about matters of collation; It's funny you should say that, as I could nearly swear that I just showed that identify cannot be determmined in the examples above without knowing about locales. To wit, while all of those sort somewhat differently, even case-insensitively, no matter whether you're thinking of a French or a Spanish ordering (and what is English's, anyway?), you have a a more fundadmental = vs != scenario which is entirely locale-dependent. If I can make a RESUME file, ought I be able to make a distcint r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant filesystem? There is no good answer, because we might think it reasonable to lc(strip_marks($old_fn)) eq lc(strip_marks($new_fn)) Theee problem of what is or is not a mark varies by locale, * Castilian doesn't think ~ is a mark; Portuguese does, and so if you strip marks, you in Castilian count as the same two letters that it deems disinct, but in Portuguese, you incur no lasting harm. * Catalan doesn't think ¸ is a mark; French does. and so if you strip marks, you in Catalan count as the same two letters that it deems disinct, but in French or Portuguese, you incur no lasting harm. * Modern English (usually) decomposes æ into a+e, but OE/AS and Icelandic do not. * Moreover, Icelandic deems é and e to be completely different letters altogether. If you strip marks, you count as the same letters that that language does not. Similarly with ö, which is at the end of their alphabet, (like ø in some), and nowhere near o or ó. BTW,
Re: Files, Directories, Resources, Operating Systems
In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] Tom Christiansen wrote: I believe database folks have been doing the same with character data, but I'm not up-to-date on the DB world, so maybe we have some metainfo about the locale to draw on there. Tim? AFAIK, modern databases are all strongly typed at least to the point that the values you store in and fetch from them are each explicitly character data or binary data or numbers or what-have-you; and so, when you are dealing with a DBMS in terms of character data, it is explicitly specified somewhere (either locally for the data or globally/hardcoded for the DBMS) that each value of character data belongs to a particular character repertoire and text encoding, and so the DBMS knows what encoding etc the character data is in, or at least it treats it consistently based on what the user said it was when it input the data. Oh, good then. That's what I'd heard was happening, but wasn't sure since I've steared clear of such beasties since before it was true. I wish our filesystems worked that way. But Andrew said something to me last week about Ken and Dennis writing quite pointedly that while you *could* use the f/s as a database, that you *shouldn't*. I didn't know the reference he was thinking of, so just nodded pensively (=thoughtfully). There is ABSOLUTELY NO WAY I've found to tell whether these utf-8 string should test equal, and when, nor how to order them, without knowing the locale: RESUME, Resume resume Resum\x{e9} r\x{E9}sum\x{E9} r\x{E9}sume\x{301} Re\x{301}sume\x{301} Case insensitively, in Spanish they should be identical in all regards. In French, they should be identical but for ties, in which case you work your way right to left on the diactricals. This leads me to talk about my main point about sensitivity etc. I believe that the most important issues here, those having to do with identity, can be discussed and solved without unduly worrying about matters of collation; It's funny you should say that, as I could nearly swear that I just showed that identify cannot be determmined in the examples above without knowing about locales. To wit, while all of those sort somewhat differently, even case-insensitively, no matter whether you're thinking of a French or a Spanish ordering (and what is English's, anyway?), you have a a more fundadmental = vs != scenario which is entirely locale-dependent. If I can make a RESUME file, ought I be able to make a distcint r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant filesystem? There is no good answer, because we might think it reasonable to lc(strip_marks($old_fn)) eq lc(strip_marks($new_fn)) Theee problem of what is or is not a mark varies by locale, * Castilian doesn't think ~ is a mark; Portuguese does, and so if you strip marks, you in Castilian count as the same two letters that it deems disinct, but in Portuguese, you incur no lasting harm. * Catalan doesn't think ¸ is a mark; French does. and so if you strip marks, you in Catalan count as the same two letters that it deems disinct, but in French or Portuguese, you incur no lasting harm. * Modern English (usually) decomposes æ into a+e, but OE/AS and Icelandic do not. * Moreover, Icelandic deems é and e to be completely different letters altogether. If you strip marks, you count as the same letters that that language does not. Similarly with ö, which is at the end of their alphabet, (like ø in some), and nowhere near o or ó. BTW, those are three separate letters, not variants. * And in OE/AS you could have a long mark on an asc (say ash for the atomic *letter* æ). If split into a and e and stripped of marks, it woudn't make any sense at all. Case in point: Ælene Frisch, whom many of you doubtless know, insists her name be spelt as I have written it. She does not want Aelene Frish, for she considers her forename to have 5 letters in it, not 6. But Unicode doesn't give us a title case version of that (did AS?), suggesting it a ligature not a digraph. But if we have a file called ÆLENE, may be assume it the same in a case- insensitive sense to both aelene and ælene? I can only go on code-points, because I don't want to deal with ß and SS and Ss. Case-folding file systems are just begging for trouble, and I just don't know what to do. Think of the 3 Greek sigmata. identity is a lot more important than collation, as well as a precondition for collation, and collation is a lot more difficult and can be put off. I agree everything with everthing save and can be put off. I would like you to be right. I should truly wish to be mistaken. And I don't know what we have for prior (cough) art. respect to dealing with a file system, generally
Re: Files, Directories, Resources, Operating Systems
Hi, First of all, sorry for breaking the thread, but I had some trouble with my mail provider, and couldn't hit the reply button. To the point... I think there are some things that are simply not solved by abstraction. Some problems are concrete problems that need concrete solutions, filesystem access is one of them, IMNSHO. I pretty much think if ($*OS ~~ POSIX) { ... } elsif ($*OS ~~ Win32) { ... } is much saner than trying to deal with an enormous API that would be the result of the attempt to get a sane abstraction of all the different possible scenarios, and that would end up having backward-incompatible changes after a while because of some use case scenario that wasn't adrressed. On the other hand, we really could think on having chmod, chown etc in the POSIX module, and have the POSIX module imported (where chmod would be in the default exports) by the prelude when in a posix machine, the same for the Win32 or whatever counterpart. Of course it would be very much interesting to have the open implemented by the POSIX module with the same API as the open implemented by the Win32 module. But I'm pretty much sure that's not the case for chown and chmod, and I don't think an abstract API is worth the trouble for 99% of the cases. But note that this doesn't stop the people in the 1% case to write the abstraction API, I just think it doesn't need to be the only way to access the features, and it certainly doesn't need to be loaded in the prelude. daniel
Re: Files, Directories, Resources, Operating Systems
Tom Christiansen wrote: In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] There is ABSOLUTELY NO WAY I've found to tell whether these utf-8 string should test equal, and when, nor how to order them, without knowing the locale: RESUME, Resume resume Resum\x{e9} r\x{E9}sum\x{E9} r\x{E9}sume\x{301} Re\x{301}sume\x{301} I believe that the most important issues here, those having to do with identity, can be discussed and solved without unduly worrying about matters of collation; It's funny you should say that, as I could nearly swear that I just showed that identify cannot be determmined in the examples above without knowing about locales. To wit, while all of those sort somewhat differently, even case-insensitively, no matter whether you're thinking of a French or a Spanish ordering (and what is English's, anyway?), you have a a more fundadmental = vs != scenario which is entirely locale-dependent. If your current abstraction level is the Unicode codepoint level, then no knowledge of locale is needed at all in an everything-sensitive filesystem. Those 7 examples are all distinct for you, end of story. So you can see why I advocate everything-sensitive as being the normal case, same as with Perl identifiers. Rather than thinking of locales in terms of something special, AFAIK any locale can be reduced to a simple (though possibly verbose but predefinable in a library) normalized portable definition built from everything-sensitive components where the components are enumerations and functions describing a character repertoire (what characters can exist) plus representation normalization rules plus where applicable collation (ordering) rules plus where applicable mutual exclusion rules. When your core toolkit just works with everything-sensitive components and insensitive or locale issues are just defined as formulae over that, then we have indeed separated the locale issues into a connected but non-core problem. So collation doesn't need to be considered in Perl's file-system interface, while identity does; collation can be a layer on top of the core interface that just cares about identity. That seems a simplified version of reality. Identity isn't what monoglots think it is. I'm wondering if we're talking about the same meaning of the word collation. The way I have been using it, or meaning to, collation simply talks about how you put a set of values in order such that each 2 distinct values has a before|after relationship. Whereas identity is testing whether 2 things you hold are just the same value or not. You don't need to have ordering rules defined in order to have known equality rules. If you *know* that the 7 strings are all UTF-8, then locale doesn't have to be considered for equality; just your unicode abstraction level matters, such as if you're defining the values in terms of graphemes vs codepoints vs bytes. That's not true. é is not the same letter as e in Icelandic. I don't consider those to be the same character period. Mind you everywhere I've said graphemes I meant language-independent graphemes. I grant you that if you get into a further abstraction level of language-dependent graphemes, then some may see those 2 characters as being identical, and if that's your point then I can better understand now where you're coming from with the problems you raise. Practically speaking, I think that portability and other concerns would require us to just not go higher than the language-independent grapheme abstraction level when dealing with either Perl identifiers or file names or other urls with non-platform-specific APIs, and simply treat every language-independent grapheme as being distinct/non-identical from every other one, even if some locales might do different. Users should be able to deal with this gracefully enough much as people can easily enough treat E and e as being distinct. -- Darren Duncan
Re: Files, Directories, Resources, Operating Systems
Richard Hainsworth wrote in perl.perl6.language : The S16: chown, chmod thread seems to be too unix-focussed. I was more or less thinking that the syscall-related primitives, like chown or chmod, could go in a POSIX namespace. Even in UNIX land nowadays the situation can be much more complex than traditional ownership and modes (a situation not entirely satisfactorily addressed by Perl 5's filetest pragma). Following the general perl6 philosophy, perhaps too there should be an abstract definition for the language that is core and additional modules that are specific to operating systems. Thus when generic software is distributed, it comes with an installer that determines the operating system chooses whether to use IO::Unix, IO::Unix::Gnome, IO::MS::WindowsXP, IO::MS::Vista, IO::Apple, etc. Maybe also IO::Internet::Http, IO::Internet::Ftp? IO (streams) and rights are not naturally related. Maybe you're thinking about filesystems and other content addressing schemes (like URLs). The subject is more complex than it seems at first glance, because you can have, for example, per-volume current working directories. It's quite hard to design something that is abstract enough, but at the same time not totally useless.
Re: Files, Directories, Resources, Operating Systems
* Richard Hainsworth ([EMAIL PROTECTED]) [081126 08:21]: The S16: chown, chmod thread seems to be too unix-focussed. To be portable, the minimum assumptions need to be made about the environment in which a program operates. Alternatively, the software needs to be able to determine whether the environment it is operating in meets a minimum set of conditions. ... Thus I would suggest that the perl6 specifications should be written in an abstract way, one not related to a specific operating system and in a way that can be adapted by an implementor to specific systems. I fully agree with you: the way the design is going is making the same mistakes of Perl5 again. Where we were able to release the Perl5 syntax more and more when the design of Perl6 made more progress, so should we do with the way we use modules. S16 is not doing that. Also Rafael's suggestion to focus on POSIX is not the way a nice interface should work. POSIX calls (and non-POSIX means) are ways to implement the interface to the Operating System, which can be different from the most practical interface on implementation level. We should focus on OS abstraction. For instance, if a file is represented in an object, then the most friendly interface names would be like: $file-owner($user); my $user = $file-owner; under the hood, we use chown and stat. I really would like to see a standard object oriented view on the system, which mainly autodetects the environment. I am really fed-up using File::Spec and Path::Class explicitly all the time. Also, I get data from a CD which was written case-insensitive and then copied to my Linux box. It would be nice to be able to say: treat this directory case insensitive (even when the implementation is slow) Shared with Windows default behavioral interface. So, I would like a radical change... trying to be as much general (non UNIX specific) as possible: (sorry, my Perl6 syntax is still very sloppy) some global $*OS # may be different per parallel instance of the program # Maybe an OS function which returns $*OS my $dir = $*OS.dir($*PROGRAM.arg[0]); # above maybe hidden with a functional wrappers: dir $argv[0] $dir.case_sensitive(0); if $dir.entry('xyz').is_file {} my $f = $dir.file('xyz'); $f.owner($*OS.user); $*OS.system('ls | lpr'); print $*OS.family; print $*OS.kernel_version; my $pid = $*OS.process.label; We should also be aware that we design Perl6 for parallelism. Do we require all nodes to run the same OS (~version)? Besides, I would really like to get a incremental growth path to do things we cannot do yet. Some things are currently difficult to realize under UNIX/Linux because there is not kernel interface defined for it. For instance, you do not know in which character-set the filename is; that is file-system dependent. So, we treat filenames as raw bytes. This does cause dangers (a UTF-8 codepoint in the filename with a \x2F ('/') byte in it, for instance) But as long as the OS cannot provide the user with this information, we should still give the author a way to specify it. $*OS.filesystem('/home', type = 'xfs', name_encoding = 'latin1' , text_content_encoding = 'utf-8,bom', illegal_chars = /\x0 , case_sensitive = 1, max_path = 1024); I have been working on such a module for Perl5 (which has a much wider field than Path::Class) but (as many other of my projects) did not complete it to a usable/publishable level (yet). It is all NOT too difficult to implement (we do share this knowledge), but the design of this needs to be free from historical mistakes. That's a challenge. -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
On Wed, Nov 26, 2008 at 12:40:41PM +0100, Mark Overmeer wrote: We should focus on OS abstraction. [...] the design of this needs to be free from historical mistakes. And avoid making too many new ones. There must be useful prior art around. Java, for example, has a FileSystem abstraction java.nio.file.FileSystem http://openjdk.java.net/projects/nio/javadoc/java/nio/file/FileSystem.html which has been extended, based on leasons learnt, in the NIO.2 project (JSR 203: More New I/O APIs for the JavaTM Platform (NIO.2) APIs for filesystem access, scalable asynchronous I/O operations, socket-channel binding and configuration, and multicast datagrams.) which enables things like being able to transparently treat a zip file as a filesystem: http://blogs.sun.com/rajendrag/entry/zip_file_system_provider_implementation See http://javanio.info/filearea/nioserver/WhatsNewNIO2.pdf Tim. p.s. I didn't know any of that when I started to write this look for prior art email, but a little searching turned up these examples. I'm sure there are more in other realms, but NIO.2 certainly looks like a rich source of good ideas derived from a wide range of experience.
Re: Files, Directories, Resources, Operating Systems
On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer [EMAIL PROTECTED] wrote: Also, I get data from a CD which was written case-insensitive and then copied to my Linux box. It would be nice to be able to say: treat this directory case insensitive (even when the implementation is slow) Shared with Windows default behavioral interface. That is a task for the operating system, not Perl. You're trying to solve the problem at the wrong end here IMHO. For instance, you do not know in which character-set the filename is; that is file-system dependent. So, we treat filenames as raw bytes. On native file-system types (like ext3fs), character-set is not file-system dependent but non-existent. It really is raw bytes. This does cause dangers (a UTF-8 codepoint in the filename with a \x2F ('/') byte in it, for instance) A \x2F always means a '/'. UTF-8 was designed to be backwards compatible like that. Regards, Leon Timmermans
Re: Files, Directories, Resources, Operating Systems
* Leon Timmermans ([EMAIL PROTECTED]) [081126 15:43]: On Wed, Nov 26, 2008 at 12:40 PM, Mark Overmeer [EMAIL PROTECTED] wrote: That is a task for the operating system, not Perl. You're trying to solve the problem at the wrong end here IMHO. In my (and your) case, the operating system is not helping at all and there is no chance in having that changed. So... My remark was just one example, and I can give many more, where I would like to see more abstraction in the OS interface to avoid the need for each user to re-invent the wheel of interoperability. For instance, you do not know in which character-set the filename is; that is file-system dependent. So, we treat filenames as raw bytes. On native file-system types (like ext3fs), character-set is not file-system dependent but non-existent. It really is raw bytes. Not on the presentation level to the user. This makes it even more horrifying. It depends on the setting of an environment variable of the actual user how the bytes of the filename are interpreted. On the OS filesystem implementation you are probably correct (in the UNIX/Linux case), but programs are used for end-user results. This does cause dangers (a UTF-8 codepoint in the filename with a \x2F ('/') byte in it, for instance) A \x2F always means a '/'. UTF-8 was designed to be backwards compatible like that. Yes, you are right on this. ASCII does not suffer from UTF-8, so my example was flawed. The second 128 does cause problems. How can glob() sort filenames, for instance? UTF-16 names should not enter the Perl program unless you are aware of it, because those can hurt badly. Please comment on the big picture in the debate: there are all kinds of OS dependent things I really would like to see hidden in a (large) abstraction layer to simplify the development of portable scripts. I don't say I know all the answers, but I do feel a lot of pain in each module for CPAN the same thing again. -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: Files, Directories, Resources, Operating Systems
On Wed, Nov 26, 2008 at 11:21:58AM +0300, Richard Hainsworth wrote: The S16: chown, chmod thread seems to be too unix-focussed. Indeed, what you are currently reading in S16 is mostly just lightly edited copy-paste from P5 docs. But the S16 draft is out in the pugs repo for a reason--anyone and everyone on this thread should consider it perfectly okay to take S16 in hand and refactor it mercilessly. Any shortcuts we wish to install into the final Perl 6 can easily be done at the last moment by the prelude aliasing common operations into the core language. Anyway, feel free to coordinate this here and/or on #perl6. (Note that Patrick is in the process of moving all the Synopses to the pugs repo at some point soon, so the current S16 in pugs/docs/Perl6/Spec is likely to have its name/location changed soon.) If you need a pugs commit bit, please ask in #perl6 on irc.freenode.net. Larry
Re: Files, Directories, Resources, Operating Systems
I agree with the idea of making Perl 6's filesystem/etc interface more abstract, as previously discussed, and also that users should be able to choose between different levels of abstraction where that makes sense, either picking a more portable interface versus a more platform-specific one. Following up on Tim Bunce's comment about looking at prior art, I also recommend looking at the SQLite DBMS, specifically its virtual file system layer; this one is designed to give you deterministic behaviour and introspection over a wide range of storage systems and attributes, both on PCs and on embedded devices, or hard disks versus flash or write once vs write many etc, where a lot of otherwise-assumptions are spelled out. One relevant url is http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good urls are. Mark Overmeer wrote: $dir.case_sensitive(0); $*OS.filesystem('/home', type = 'xfs', name_encoding = 'latin1' , text_content_encoding = 'utf-8,bom', illegal_chars = /\x0 , case_sensitive = 1, max_path = 1024); I understand that the above, concerning case-sensitivity, is just meant to be an example, but I want to explore that in more detail for a moment, as it reflects a common perception that only scratches the surface and needs to be fleshed out more. To summarize, what we really want is something more generic than case-sensitivity, which is text normalization and text folding in general, as well as distinctly dealing with distinctness for representation versus distinctness for mutual exclusivity. For example, one file system will represent your chosen case for a filename but it won't allow 2 files in the same directory whose filenames are non-distinct when uppercased; another file system in contrast would also represent a filename uppercased. For another example, one file system will not distinguish between accents on letters while another would, and this is orthogonal to case-sensitivity. Or for another, one might treat a run of whitespace as being equivalent to a single whitespace character, or whitespace characters are ignored entirely. Also, the paradigm that is the most distinguishing (case-sensitive, accent-sensitive, whitespace-sensitive, etc) should be the default, and any boolean option to change an aspect of this should be named that a false value is more distinguishing and a true value is less distinguishing. For example, a flag should be named ignores_case rather than case_sensitive; this also assumes that if named arguments are optional, then the common default value of a boolean-typed argument is false. Naming something case_sensitive implies that sensitivity is special whereas sensitivity should be considered normal, and rather insensitivity should be considered special. -- Darren Duncan
Re: Files, Directories, Resources, Operating Systems
On Wed, 2008-11-26 at 11:34 -0800, Darren Duncan wrote: I agree with the idea of making Perl 6's filesystem/etc interface more abstract, as previously discussed, and also that users should be able to choose between different levels of abstraction where that makes sense, either picking a more portable interface versus a more platform-specific one. Agreed on both counts. Following up on Tim Bunce's comment about looking at prior art, I also recommend looking at the SQLite DBMS, specifically its virtual file system layer; this one is designed to give you deterministic behaviour and introspection over a wide range of storage systems and attributes, both on PCs and on embedded devices, or hard disks versus flash or write once vs write many etc, where a lot of otherwise-assumptions are spelled out. One relevant url is http://sqlite.org/c3ref/vfs.html and for the moment I forget where other good urls are. There are also higher-level VFS systems, such as Icculus.org PhysicsFS, which goes farther than just abstracting the OS operations. It also abstracts away the differences between archives and real directories, unions multiple directory trees on top of each other, and transparently redirects writes to a different trunk than reads: http://icculus.org/physfs/ I want to be able to support that functionality in a way that still allows me to open and close PhysicsFS files and directories the way I would normally. I want to be able to layer it *under* the standard Perl IO ops, rather than above them. The following is all obvious, but just to keep it in people's minds and frame the discussion: Being able to layer IO abstractions is at least as important as the basic OS abstraction itself -- as well as the ability to use the high level abstraction most of the time, but reach down the stack when needed. This implies making best effort to minimize the ways in which upper layers will be hopelessly confused by low-level operations, and documenting the heck out of the problem areas. These layers should be mix-and-match as much as possible, with abstractions designed with common interfaces. Certainly Perl 5's IO layers, as well as any networking or library stack, are prior art here. To summarize, what we really want is something more generic than case-sensitivity, which is text normalization and text folding in general, as well as distinctly dealing with distinctness for representation versus distinctness for mutual exclusivity. Yes, definitely. [This] implies that sensitivity is special whereas sensitivity should be considered normal, and rather insensitivity should be considered special. If only that were true in other areas of life. :-) -'f
Re: Files, Directories, Resources, Operating Systems
On Wed, Nov 26, 2008 at 5:15 PM, Mark Overmeer [EMAIL PROTECTED] wrote: Yes, you are right on this. ASCII does not suffer from UTF-8, so my example was flawed. The second 128 does cause problems. How can glob() sort filenames, for instance? That's a matter of collation, not (just) character set. TIMTOWTDI. There is no right way to do it as it depends on the circumstances, but a simple binary sort is not a bad default. Leon Timmermans
Re: Files, Directories, Resources, Operating Systems
Can I just remind everyone that (IMO) we shouldn't just be considering filesystems here? I think it would be a pretty useful feature to have a general tree manipulation interface, and then this could be applied to filesystems, or XML, or LDAP, or SQL (although this doesn't map so well), or whatever. I guess the way I see it, you'd have something like this: role Tree::Node {...} role Filesystem::Node inherits from Tree::Node {...} role Filesystem::Directory inherits from Filesystem::Node {...} class Filesystem::File does Filesystem::Node { # Interface, like DBI has $implementation handles *; $implementation = Filesystem::File::XML-new(); } class Filesystem::File::XML inherits from Filesystem::File::Base {...} In the case of Filesystem::Node, you would define some standard attribute names (eg. owner, is_readable), but then they would be accessible through the standard Tree::Node.get_attribute() interface. And the standard Tree::Node.get_children() would be implemented by Filesystem::File as something to fetch the contents of the file; in the case of Filesystem::XMLFile, it would turn the contents into a tree of XML nodes. I agree about the different levels of abstractions, but just wanted to put in a plug for this one as one that I like. :) - | Name: Tim Nelson | Because the Creator is,| | E-mail: [EMAIL PROTECTED]| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-
Re: Files, Directories, Resources, Operating Systems
Tom Christiansen wrote: I believe database folks have been doing the same with character data, but I'm not up-to-date on the DB world, so maybe we have some metainfo about the locale to draw on there. Tim? AFAIK, modern databases are all strongly typed at least to the point that the values you store in and fetch from them are each explicitly character data or binary data or numbers or what-have-you; and so, when you are dealing with a DBMS in terms of character data, it is explicitly specified somewhere (either locally for the data or globally/hardcoded for the DBMS) that each value of character data belongs to a particular character repertoire and text encoding, and so the DBMS knows what encoding etc the character data is in, or at least it treats it consistently based on what the user said it was when it input the data. The only time this information isn't really remembered is if the data is supplied in terms of being binary data. Maybe some older or unusual DBMSs aren't this way, and of course technically a filesystem etc *is* a database ... I think that example mentioned about filename storage being locale dependent, probably meant that at the actual filesystem level it was just dealing with the names as binary data. There is ABSOLUTELY NO WAY I've found to tell whether these utf-8 string should test equal, and when, nor how to order them, without knowing the locale: RESUME, Resume resume Resum\x{e9} r\x{E9}sum\x{E9} r\x{E9}sume\x{301} Re\x{301}sume\x{301} Case insensitively, in Spanish they should be identical in all regards. In French, they should be identical but for ties, in which case you work your way right to left on the diactricals. This leads me to talk about my main point about sensitivity etc. I believe that the most important issues here, those having to do with identity, can be discussed and solved without unduly worrying about matters of collation; identity is a lot more important than collation, as well as a precondition for collation, and collation is a lot more difficult and can be put off. With respect to dealing with a file system, generally it is just identity that matters and collation is a concern that can typically be just tacked on after identity is solved. That is, with a file system you need to know whether or not a file name you hold will or won't match a file in the system, and matching or not-matching is the main function of an identity. Similarly, the file system has to make sure that no 2 distinct files in it have the same file name, that is the same public identity. In contrast, the order that you order or sort a list of files by their names usually isn't so important; while all work with a file system requires working with identities, most work does not need to deal with collation. In practice several parties can agree on a single means of identifying files, while still having their own favorite collations, so the same list can be ordered in different ways. Collation criteria is something that can be naturally applied externally to a file system, such as by a user program, and only identity criteria needs to be built-in to the file system. So collation doesn't need to be considered in Perl's file-system interface, while identity does; collation can be a layer on top of the core interface that just cares about identity. One maxim I apply in my database work, and that I believe applies to this discussion, is any logical difference is a big difference. If you have 2 distinct value literals such that you consider the difference in each literal's spelling to be significant, such that you can't for all use cases substitute one literal for the other, then the 2 literals denote 2 distinct values; in the other case, where you can always substitute one for the other harmlessly, then they denote the same value. The concept of 'value' and 'identity' are the same, and any value is its own identity. And so, with your 7 'resume' literals, I would say that if there is a reason for any of the spellings to exist that couldn't be handled by one of the other spellings, then all 7 literals are distinct/non-identical taken as-is. If you *know* that the 7 strings are all UTF-8, then locale doesn't have to be considered for equality; just your unicode abstraction level matters, such as if you're defining the values in terms of graphemes vs codepoints vs bytes. When talking about identity, there is no such thing as case-insensitivity or accent insensitivity or whitespace insensitivity or what have you. If you have any reason to not replace every E with an e or vice-versa in your character string, then you consider those 2 non-identical and so they wouldn't match; by contrast, true case-insensitivity means you can replace every e with an E (for example) and forget than an e ever existed; the actual equality test is then the same since all comparands would only have the E. And so
Re: Files, Directories, Resources, Operating Systems
* Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]: On Wed, 26 Nov 2008 11:18:01 PST.--or, for backwards compatibility, at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC, Larry Wall [EMAIL PROTECTED] wrote: SUMMARY: I've been looking into this sort of thing lately (see p5p), and there may not even *be* **a** right answer. The reasons why take us into an area we've traditionally avoided. What a long message... Mark We should focus on OS abstraction. Mark [...] the design of this needs to be free from historical mistakes. ... It cannot be done in an automated fashion, since you can't know a filesystem that knew *locale* each filename was created under, and without that, you have to guess--almost always wrongly. Exactly. This is an historical mistake, understandable to have at least a path of growth from the current system open() interface. Only users which have the same locale can see the names the same. If you change your locale your filenames break! Say you change from cyrillic into English. In my suggestion, the programmer (who is ofter local on the system) can at least say what the locale was when the filenames where created. On some OS, that OS can tell you. What I would like is an object model which does allow us at least to abstract these problems away... whether it can be resolved automatically or only with help is for later. There is ABSOLUTELY NO WAY I've found to tell whether these utf-8 string should test equal, and when, nor how to order them, without knowing the locale: RESUME, Resume resume Resum\x{e9} r\x{E9}sum\x{E9} r\x{E9}sume\x{301} Re\x{301}sume\x{301} This is done by the locale of the user of the script, as usual for ls(1). So, I do not see your problem here. I don't mind if problems with unicode are not solved or solvable. Could be discuss about a buildin File::Spec/Path::Class? And we allow us the same limitations as these have, for the moment. -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net