On Friday 13 November 2009 15:47, McKown, John wrote:
>This goes back to the person who wanted some way to emulate DD concatenation
> of multiple datasets so that they are read as if they were one. Everybody
> agrees that there isn't an easy way. Now, I don't know filesystem
> internals. But what about a new type of symlink? Normally, a symlink
> contains the real name of the file. Sometimes a symlink will point to
> another symlink, and so on (I don't know how deep). What about a
> multi-symlink. That's where a symlink points to multiple files in a
> specific order. When the symlink is opened and read, each file in the
> symlink is opened and read in order. I know this would require some changes
> to open() as well, in order to make sure that each file in the symlink
> chain is readable by the process.
>
>What think? Or is this just alien to the UNIX mindset?
An interesting idea, and yes it is wierd and rather alien to UNIX minds.
You're implementing something at the filesystem level which is trivially
implemented at the process level. And all to avoid some IPC via pipes? Has
anyone calculated how much overhead there is in using cat to pipe some files
into a process instead of having the process read the files itself?
The more I think about this, the less this seems like a symlink. I thinking
of it as a meta-file: a file of files. This introduces the idea of a new
type of file whose contents are known to and interpreted by the system, in
the way a directory-file's contents are known. Does this really have any
value?
Regardless of its value, in thinking of how to implement this, I see a few
problems:
- What happens if one of the files is missing?
- How do you seek() in such a file?
- Similarlly, how do you implement locks on byte ranges within such a file?
- What happens if another process appends to one of the files while you are
reading a later one in the sequence? Does your read position change?
You can solve those, perhaps, by requiring an open() of a meta-file to open
all of the listed files. If any file open fails, the meta-file open fails
and closes all the others. A meta-file's file descriptor would have to refer
to a new kernel data structure that is a list of the open file descriptors of
the listed files (or rather pointers to the data structures referenced by
those file descriptors). This structure would be used to map an offset
within the meta-file to an offset within one of the list of files, using the
file's lengths. This solves the seek and lock problems. I'm still not sure
about the append problem, though.
Another possible implementation would be entirely within the filesystem, where
the meta-file would have direct access to the data-blocks of the underlying
files. I think that opens up too many cans-o-worms to be a good solution,
though.
Of course, once you have this kind of file, you have meta-files of meta-files
of meta-files of ... Isn't it better to represent such structures in
user-space instead of kernel-space?
>ln -s symlink realfile1 realfile2 /etc/fstab /tmp/somefile
This command-line syntax is already used by ln (the third form listed in the
manpage synopsis) to create several symlinks in a directory, which is the
final argument.
It's an interesting idea, but I'm not convinced of its utility. I'd like to
know what percentage of the I/O time (or CPU cycles) is used by piping files
via cat. Anyone have any measurements?
- MacK.
-----
Edmund R. MacKenty
Software Architect
Rocket Software
275 Grove Street · Newton, MA 02466-2272 · USA
Tel: +1.617.614.4321
Email: [email protected]
Web: www.rocketsoftware.com
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390