Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-06-04 Thread Tejun Heo
Jens Axboe wrote:
 On Sat, Jun 02 2007, Tejun Heo wrote:
 Hello,

 Jens Axboe wrote:
 Would that be very different from issuing barrier and not waiting for
 its completion?  For ATA and SCSI, we'll have to flush write back cache
 anyway, so I don't see how we can get performance advantage by
 implementing separate WRITE_ORDERED.  I think zero-length barrier
 (haven't looked at the code yet, still recovering from jet lag :-) can
 serve as genuine barrier without the extra write tho.
 As always, it depends :-)

 If you are doing pure flush barriers, then there's no difference. Unless
 you only guarantee ordering wrt previously submitted requests, in which
 case you can eliminate the post flush.

 If you are doing ordered tags, then just setting the ordered bit is
 enough. That is different from the barrier in that we don't need a flush
 of FUA bit set.
 Hmmm... I'm feeling dense.  Zero-length barrier also requires only one
 flush to separate requests before and after it (haven't looked at the
 code yet, will soon).  Can you enlighten me?
 
 Yeah, that's what the zero-length barrier implementation I posted does.
 Not sure if you have a question beyond that, if so fire away :-)

I thought you were talking about adding BIO_RW_ORDERED instead of
exposing zero length BIO_RW_BARRIER.  Sorry about the confusion.  :-)

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Design question

2007-06-04 Thread David H. Lynch Jr.

I am not sure there really is a best list for this, but this is the
closest I can think of.
I am working on host software, for a series of cards
(www.picocomputing.com)
All of these cards have an FPGA, most have a processor, memory,
flash and other resources.
They have different flavor FPGA, CPU's and busses (compact Flash,
PCI, CardBus, Express Bus, )
The cards can run standalone, and they can run uCLinux, Linux, or
GreenHills, or ...
   
But when they are inserted into a Linux Host (there is already
windows host software)
From the host side there are several major tasks that might be
performed,
Reading writing target memory, IO space, preprogramming the FPGA,
Flash,
or reading/writing to the hardware implemented in the FPGA - which
may or may not require interaction with
the target OS. The performance and bandwidth requirements of
data-transfers varies greatly,
most being fairly mellow, but occasionally either the bandwidth or
latency requirements can be high.
Some of the above is vague - We are not developing a specific peice
of hardware for a specific use.
We are working on a product and development environment that has
nearly infinite uses.

We have evolved a virtual channel architecture that is to allow
creating IP that goes into the FPGA, and can be
accessed by applications on the host. There can be multiple cards in
the same host (in some applications large numbers)
despite all of the above things are not all that complex, just
incredibly flexible.
Anyway that is the BIG picture.

Our/my original implementation of  a driver(s) for this was a
character driver for each card specific to its bus type,
with minor device numbers for reading writing different regions, and
a large collection of ioctl's to handle special functions.

But the vast majority of actions consist of read/write.

I have been trying to decide if it make sense to rewrite the driver
as a VFS driver, with special file names for each channel or
functional unit.

I have also been trying to digest enough information on sysfs to
determine if that is an appropriate approach.

Basically I am trying to decide what kind of driver provides the
best potential solution.


  





   


   
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 06/18] fs/logfs/compr.c

2007-06-04 Thread Jörn Engel
On Sun, 3 June 2007 23:58:43 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
  +#define COMPR_LEVEL 3
  +
  +static DEFINE_MUTEX(compr_mutex);
  +static struct z_stream_s stream;
 
 Is there a particular reason to choose '3' as the only compression
 level? Should this perhaps be a per-superblock option instead?

There is no particular reason.  '3' should be a reasonable value for
most people.  If actual users want to change this value, I can make it a
mount option as well.  Right now I'm just lazy and doubt the merits.

 Also, I thought I saw discussion about making the mutex and
 stream per-superblock, but don't remember if the idea was discarded.
 If it was, you might want to add it to the won't-happen list.

It was more or less discarded.  As long as the sweet spot for LogFS is
small systems, saving memory is more important than multithreaded
performance.  Will add it to the list.

Jörn

-- 
Joern's library part 2:
http://www.art.net/~hopkins/Don/unix-haters/tirix/embarrassing-memo.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 09/18] fs/logfs/gc.c

2007-06-04 Thread Jörn Engel
On Mon, 4 June 2007 00:07:36 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
  +static long decay(long t0, long t, long theta)
  +{
  +   long shift, fac;
  +
  +   if (t = 32*theta)
  +   return 0;
  +
  +   shift = t/theta;
  +   fac = theta - (t%theta)/2;
  +   return (t0  shift) * fac / theta;
  +}
 
 I think it's confusion to work with 'long' arguments
 here. If you actually allow larger than 32 bit arguments,
 that means that the gc logic behaves differently on
 32 and 64 bit CPUs, which I don't think is what you
 intended.

Different behaviour would be fine.  This function will be used to pick
good candidates for garbage collection.  If one segment will get chosen
over another depending on BITS_PER_LONG, either one would have been a
good candidate anyway.

Hmm.  Maybe I should s/32/BITS_PER_LONG/ in the function.

 Also, can any of the arguments be negative? How about
 making them all explicit u32 and u64 variables?

That would make sense, yes.

Jörn

-- 
I've never met a human being who would want to read 17,000 pages of
documentation, and if there was, I'd kill him to get him out of the
gene pool.
-- Joseph Costello
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LogFS take four

2007-06-04 Thread Jörn Engel
On Mon, 4 June 2007 00:18:21 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
 
  Unchanged:
  o error handling
  
 ...
  Won't happen (unless I get convinced to do otherwise):
  o Change LOGFS_BUG() and LOGFS_BUG_ON() to inline functions
These are macros for very much the same reasons BUG() and BUG_ON() are.
 
 I wonder how many of your LOGFS_BUG{,_ON} still remain after the
 error handling is in place to deal with broken file system contents.
 Ideally, I'd say the current LOGFS_BUG() should be replaced with
 a function that prints about the kind of error it has hit (rate-limited),
 potentially calls logfs_crash_dump(), and remounts the medium read-only,
 but _not_ call BUG().

That sounds fairly useful, actually.  Do a WARN_ON(), call
logfs_crash_dump(), remount read-only and finished.  Rate-limiting might
be unnecessary as the read-only thing already limits the rate to 1.

I like the idea.

Jörn

-- 
Fools ignore complexity.  Pragmatists suffer it.
Some can avoid it.  Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept.  1982
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 14/18] fs/logfs/segment.c

2007-06-04 Thread Jörn Engel
On Mon, 4 June 2007 00:21:41 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
  +static DEFINE_MUTEX(compr_mutex);
  +
 
 It seems you define a static compre_mutex in both segment.c and in compr.c,
 and always lock them both at the same time. Is that a correct observation?
 Is it intentional, or an oversight on your side?

Lame coding on my side.  Seems to have gone lost in my notes, but this
mutex should get removed and the protected memory made per-superblock.
Unlike the zlib workspace it does not consume 300k, so there is no
excuse for it here.

Jörn

-- 
Joern's library part 9:
http://www.scl.ameslab.gov/Publications/Gus/TwelveWays.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 05/18] fs/logfs/logfs.h

2007-06-04 Thread Jörn Engel
On Sun, 3 June 2007 23:50:55 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
  +/**
  + * struct logfs_device_ops - device access operations
  + *
  + * @read:  read from the device
  + * @write: write to the device
  + * @erase: erase part of the device
  + */
  +struct logfs_device_ops {
  +   int (*read)(struct super_block *sb, loff_t ofs, size_t len, void 
  *buf);
  +   int (*write)(struct super_block *sb, loff_t ofs, size_t len, void 
  *buf);
  +   int (*erase)(struct super_block *sb, loff_t ofs, size_t len);
  +};
 
 I wonder if there is a way to document the prototypes of these function
 pointers with kerneldoc, other than having a typedef for each.
 
 What brought me to this point is that I first assumed they would return
 the number of bytes transferred, like read/write file operations, where
 your functions return zero on success.

I can just add a comment about the return code in the struct
documentation.  For the foreseeable future there will be exactly two
instances of this structure.  It's not as if every driver would
implement this.

Jörn

-- 
A defeated army first battles and then seeks victory.
-- Sun Tzu
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 04/18] include/linux/logfs.h

2007-06-04 Thread Jörn Engel
On Sun, 3 June 2007 23:42:25 +0200, Arnd Bergmann wrote:
 On Sunday 03 June 2007, Jörn Engel wrote:
  +struct logfs_je_spillout {
  +   __be64  so_segment[0];
  +}__packed;
 
 All the on-disk data structures you define in this file have naturally
 aligned members, so the __packed attribute is not needed.

Amen.  It is purely paranoia and I don't even know who is out to get me.

 However, I think it causes gcc to generate larger and slower code
 on some architectures, because now it has to assume that the data
 structure itself has no more than byte alignment.
 
 I'd simply remove all instances of __packed therefore. In order
 to verify that you got it right in all cases, build with
 '-Wpadded -Wpacked'.

Fine with me.  Will do.

Jörn

-- 
Ninety percent of everything is crap.
-- Sturgeon's Law
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 05/18] fs/logfs/logfs.h

2007-06-04 Thread Jan Engelhardt

On Sunday 03 June 2007, Jörn Engel wrote:
 +/**
 + * struct logfs_device_ops - device access operations
 + *
 + * @read:  read from the device
 + * @write: write to the device
 + * @erase: erase part of the device
 + */
 +struct logfs_device_ops {
 +   int (*read)(struct super_block *sb, loff_t ofs, size_t len, void 
 *buf);
 +   int (*write)(struct super_block *sb, loff_t ofs, size_t len, void 
 *buf);
 +   int (*erase)(struct super_block *sb, loff_t ofs, size_t len);
 +};

I wonder if there is a way to document the prototypes of these function
pointers with kerneldoc, other than having a typedef for each.

What brought me to this point is that I first assumed they would return
the number of bytes transferred, like read/write file operations, where
your functions return zero on success.

read/write functions returning bytes written would return ssize_t,
just as vfs_read and vfs_write do.



Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Read/write counts

2007-06-04 Thread David H. Lynch Jr.

I have a file system that has really odd blocking.

All files have a variable length header (basically a directory
entry) at their start.
Most but not all sectors, have a small fixed length signature as
well as some link data at their start.

The net result is that implimentation would be simpler if I could
just read/write, the amount of data
that can be done with the least amount of work, even if that is less
than was requested.

If I receive a request to read 512 bytes, and I return that I have
read 486, is either the OS, libc, or something else
going to treat that as an error, or are they coming back for the
rest in a subsequent call ?

I though I recalled that read()/write() returning a cound less than
requested is not an error.
   
   
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Pavel Machek
On Wed 2007-05-23 18:16:45, Andreas Gruenbacher wrote:
 On Tuesday 15 May 2007 11:14, Pavel Machek wrote:
  Why is this configurable? 
 
 The maximum length of a pathname is an arbitrary limit: we don't want to 
 allocate arbitrary amounts of of kernel memory for pathnames so we introduce 
 this limit and set it to a reasonable value. In the unlikely case that 
 someone uses insanely long pathnames, this limit can be increased.

vfs does not have configurable pathname limit, and I do not see what
is so special about AA to require this kind of uglyness.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Andreas Gruenbacher
On Monday 04 June 2007 12:55, Pavel Machek wrote:
 On Wed 2007-05-23 18:16:45, Andreas Gruenbacher wrote:
  On Tuesday 15 May 2007 11:14, Pavel Machek wrote:
   Why is this configurable?
 
  The maximum length of a pathname is an arbitrary limit: we don't want to
  allocate arbitrary amounts of of kernel memory for pathnames so we
  introduce this limit and set it to a reasonable value. In the unlikely
  case that someone uses insanely long pathnames, this limit can be
  increased.

 vfs does not have configurable pathname limit, and I do not see what
 is so special about AA to require this kind of uglyness.

You very well know that the vfs has a limit of PATH_MAX characters (4096) for 
pathnames. This means that at most that many characters can be passed at 
once. 

I've really got enough of your perpetual unfounded rants.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Pavel Machek
On Mon 2007-06-04 13:25:30, Andreas Gruenbacher wrote:
 On Monday 04 June 2007 12:55, Pavel Machek wrote:
  On Wed 2007-05-23 18:16:45, Andreas Gruenbacher wrote:
   On Tuesday 15 May 2007 11:14, Pavel Machek wrote:
Why is this configurable?
  
   The maximum length of a pathname is an arbitrary limit: we don't want to
   allocate arbitrary amounts of of kernel memory for pathnames so we
   introduce this limit and set it to a reasonable value. In the unlikely
   case that someone uses insanely long pathnames, this limit can be
   increased.
 
  vfs does not have configurable pathname limit, and I do not see what
  is so special about AA to require this kind of uglyness.
 
 You very well know that the vfs has a limit of PATH_MAX characters (4096) for 
 pathnames. This means that at most that many characters can be passed at 
 once. 

Sorry then. Why not reuse the PATH_MAX when it exists already?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Andreas Gruenbacher
On Monday 04 June 2007 13:35, Pavel Machek wrote:
 On Mon 2007-06-04 13:25:30, Andreas Gruenbacher wrote:
  On Monday 04 June 2007 12:55, Pavel Machek wrote:
   On Wed 2007-05-23 18:16:45, Andreas Gruenbacher wrote:
On Tuesday 15 May 2007 11:14, Pavel Machek wrote:
 Why is this configurable?
   
The maximum length of a pathname is an arbitrary limit: we don't want
to allocate arbitrary amounts of of kernel memory for pathnames so we
introduce this limit and set it to a reasonable value. In the
unlikely case that someone uses insanely long pathnames, this limit
can be increased.
  
   vfs does not have configurable pathname limit, and I do not see what
   is so special about AA to require this kind of uglyness.
 
  You very well know that the vfs has a limit of PATH_MAX characters (4096)
  for pathnames. This means that at most that many characters can be passed
  at once.

What users can do is something like this:

  chdir(some/long/path);
  chdir(some/even/longer/path);
  ...

and the total length of the path can then exceed PATH_MAX characters. We can 
only accept pathnames up to some upper limit, and we need to somehow define 
what that limit is supposed to be. We could use PATH_MAX or some other 
arbitrary number. In most situations PATH_MAX will be fine, but that's not 
always guaranteed to be the case. So what's wrong about making this 
configurable for special situations that we might run into? Module parameters 
are *really* dead cheap.

Andreas
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Pavel Machek
Hi!

   You very well know that the vfs has a limit of PATH_MAX characters (4096)
   for pathnames. This means that at most that many characters can be passed
   at once.
 
 What users can do is something like this:
 
   chdir(some/long/path);
   chdir(some/even/longer/path);
   ...
 
 and the total length of the path can then exceed PATH_MAX characters. We can 
 only accept pathnames up to some upper limit, and we need to somehow define 
 what that limit is supposed to be. We could use PATH_MAX or some other 
 arbitrary number. In most situations PATH_MAX will be fine, but that's not 
 always guaranteed to be the case. So what's wrong about making this 
 configurable for special situations that we might run into? Module parameters 
 are *really* dead cheap.

Parameters are cheap, but this one is ugly.

How will kernel work with very long paths? I'd suspect some problems,
if path is 1MB long and I attempt to print it in /proc
somewhere. Perhaps vfs should be modified not to allow such crazy
paths? But placing limit in aa is ugly.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 04/18] include/linux/logfs.h

2007-06-04 Thread David Woodhouse
On Mon, 2007-06-04 at 11:12 +0200, Jörn Engel wrote:
 On Sun, 3 June 2007 23:42:25 +0200, Arnd Bergmann wrote:
  On Sunday 03 June 2007, Jörn Engel wrote:
   +struct logfs_je_spillout {
   +   __be64  so_segment[0];
   +}__packed;
  
  All the on-disk data structures you define in this file have naturally
  aligned members, so the __packed attribute is not needed.
 
 Amen.  It is purely paranoia and I don't even know who is out to get me.

You can _never_ know who is out to get you, or what architecture we'll
be ported to next week.

The advice don't tell the compiler what you want unless you _know_
it'll do the wrong thing otherwise runs counter to everything we've
learned, slowly and painfully, over the last few years.

We should never rely on compiler behaviour which is undocumented and
unrequired. Even if you know that the ABI forces it to continue to do
the right thing on the platforms you _currently_ care about, it might
not do it on new platforms (or existing platforms you didn't manage to
test).

It would be better if GCC had a 'nopadding' attribute which gave us what
we need without the _extra_ implications about alignment. In the absence
of that, though, you should at _least_ have a check on the size of the
structure if you're doing to drop the packed attribute.

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 06/18] fs/logfs/compr.c

2007-06-04 Thread David Woodhouse
On Mon, 2007-06-04 at 10:54 +0200, Jörn Engel wrote:
 There is no particular reason.  '3' should be a reasonable value for
 most people.  If actual users want to change this value, I can make it
 a mount option as well.  Right now I'm just lazy and doubt the merits.

I think you probably made the right choice. If you're compressing small
chunks at a time, increasing the amount of history which the compressor
will keep really doesn't buy you much.

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH] locks: provide a file lease method enabling cluster-coherent leases

2007-06-04 Thread J. Bruce Fields
On Sat, Jun 02, 2007 at 02:21:22PM -0400, Trond Myklebust wrote:
 Currently, the lease handling is done all in the VFS, and is done prior
 to calling any filesystem operations. Bruce's break_lease() inode
 operation allows the VFS to notify the filesystem that some operation is
 going to be called that requires the lease to be broken.
 
 My point is that in doing so, you are not atomic with the operation that
 requires the lease to be broken. Some different node may re-establish a
 lease while we're calling back down into the filesystem to perform the
 operation.
 So I agree with you. The break_lease() inode operation isn't going to
 work. The filesystem is going to have to figure out for itself when it
 needs to notify other nodes that the lease needs breaking, and it needs
 to figure out its own methods for ensuring atomicity.

OK, I agree with you both, thanks for the explanations.

It looks to me like there's probably a race in the existing code that
will allow conflicting opens and leases to be granted simultaneously if
the lease request is handled just after may_open() is called.  These
checks at the beginning of __setlease() are an attempt to prevent that
race:

if ((arg == F_RDLCK)  (atomic_read(inode-i_writecount)  0))
goto out;
if ((arg == F_WRLCK)
 ((atomic_read(dentry-d_count)  1)
|| (atomic_read(inode-i_count)  1)))
goto out;

But, for example, in the case of a simultaneous write open and RDLCK
lease request, I believe the call to setlease could come after the
may_open() but before the call to get_write_access() that bumps
i_writecount.

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 04/18] include/linux/logfs.h

2007-06-04 Thread Jörn Engel
On Mon, 4 June 2007 14:38:23 +0100, David Woodhouse wrote:
 On Mon, 2007-06-04 at 11:12 +0200, Jörn Engel wrote:
  On Sun, 3 June 2007 23:42:25 +0200, Arnd Bergmann wrote:
   On Sunday 03 June 2007, Jörn Engel wrote:
+struct logfs_je_spillout {
+   __be64  so_segment[0];
+}__packed;
   
   All the on-disk data structures you define in this file have naturally
   aligned members, so the __packed attribute is not needed.
  
  Amen.  It is purely paranoia and I don't even know who is out to get me.
 
 You can _never_ know who is out to get you, or what architecture we'll
 be ported to next week.
 
 The advice don't tell the compiler what you want unless you _know_
 it'll do the wrong thing otherwise runs counter to everything we've
 learned, slowly and painfully, over the last few years.
 
 We should never rely on compiler behaviour which is undocumented and
 unrequired. Even if you know that the ABI forces it to continue to do
 the right thing on the platforms you _currently_ care about, it might
 not do it on new platforms (or existing platforms you didn't manage to
 test).
 
 It would be better if GCC had a 'nopadding' attribute which gave us what
 we need without the _extra_ implications about alignment. In the absence
 of that, though, you should at _least_ have a check on the size of the
 structure if you're doing to drop the packed attribute.

Adding a size check is simple enough.  But given all this I'll put it to
the very end of my todo list.  There are many other optimizations
remaining.

Jörn

-- 
Joern's library part 14:
http://www.sandpile.org/
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 38/45] AppArmor: Module and LSM hooks

2007-06-04 Thread Andreas Gruenbacher
On Monday 04 June 2007 15:12, Pavel Machek wrote:
 How will kernel work with very long paths? I'd suspect some problems,
 if path is 1MB long and I attempt to print it in /proc
 somewhere.

Pathnames are only used for informational purposes in the kernel, except in 
AppArmor of course. /proc only uses pathnames in a few places, 
but /proc/mounts will silently fail and produce garbage entries. That's not 
ideal of course; we should fix that somehow.

Note that this has nothing to do with the AppArmor discussion ...

 Perhaps vfs should be modified not to allow such crazy paths? But placing
 limit in aa is ugly. 

Dream on. Redefining fundamental vfs semantics is not an option; we should 
rather make sure that we fail gracefully. Considering the alternatives, I 
still prefer the configurable limit. That's way more useful than allowing a 
process to DOS the kernel with AppArmor.

Andreas
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Andreas Dilger
On Jun 04, 2007  06:20 -0400, David H. Lynch Jr. wrote:
 The net result is that implimentation would be simpler if I could
 just read/write, the amount of data that can be done with the least
 amount of work, even if that is less than was requested.
 
 If I receive a request to read 512 bytes, and I return that I have read
 486, is either the OS, libc, or something else going to treat that as an
 error, or are they coming back for the rest in a subsequent call ?
 
 I though I recalled that read()/write() returning a cound less than
 requested is not an error.

It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.
They will assume that if the amount read/written is != amount requested
that this is an error.  Of course the opposite is also true - some
applications assume that the amount requested == amount read/written and
don't even check whether that is actually the case or not.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Bryan Henderson
It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.

I'd give it  a slightly different nuance.  It's not an error, and it's a 
reasonable thing to do, but there is value in not doing it.  POSIX and its 
predecessors back to the beginning of Unix say read()/write() don't have 
to transfer the full count (they must transfer at least one byte).  The 
main reason for this choice is that it may require more resources (e.g.  a 
memory buffer) than the system can allocate to do the whole request at 
once.

Programs that assume a full transfer are fairly common, but are 
universally regarded as either broken or just lazy, and when it does cause 
a problem, it is far more common to fix the application than the kernel.

Most application programs access files via libc's fread/fwrite, which 
don't have partial transfers.  GNU libc does handle partial (kernel) reads 
and writes correctly.  I'd be surprised if someone can name a major 
application that doesn't.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Matthew Wilcox
On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
 Programs that assume a full transfer are fairly common, but are 
 universally regarded as either broken or just lazy, and when it does cause 
 a problem, it is far more common to fix the application than the kernel.

Linus has explicitly forbidden short reads from being returned.  The
original poster may get away with it for a specialised case, but for
example, signals may not cause a return to userspace with a short read
for exactly this reason.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Theodore Tso
On Mon, Jun 04, 2007 at 11:02:23AM -0600, Matthew Wilcox wrote:
 On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
  Programs that assume a full transfer are fairly common, but are 
  universally regarded as either broken or just lazy, and when it does cause 
  a problem, it is far more common to fix the application than the kernel.
 
 Linus has explicitly forbidden short reads from being returned.  The
 original poster may get away with it for a specialised case, but for
 example, signals may not cause a return to userspace with a short read
 for exactly this reason.

Hmm, I'm not sure I would go that far.  Per the POSIX specification,
we support the optional BSD-style restartable system calls for signals
which will avoid short reads; but this is only true if SA_RESTART is
passed to sigaction().  Without SA_RESTART, we will indeed return
short reads, as required by POSIX.

I don't think Linus has said that short reads are always evil; I
certainly can't remember him ever making that statement.  Do you have
a pointer to a LKML message where he's said that?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Roman Zippel
Hi,

On Mon, 4 Jun 2007, Theodore Tso wrote:

 Hmm, I'm not sure I would go that far.  Per the POSIX specification,
 we support the optional BSD-style restartable system calls for signals
 which will avoid short reads; but this is only true if SA_RESTART is
 passed to sigaction().  Without SA_RESTART, we will indeed return
 short reads, as required by POSIX.
 
 I don't think Linus has said that short reads are always evil; I
 certainly can't remember him ever making that statement.  Do you have
 a pointer to a LKML message where he's said that?

That's the last discussion about signals and I/O I can remember:
http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Joel Becker
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
 On Mon, 4 Jun 2007, Theodore Tso wrote:
 
  Hmm, I'm not sure I would go that far.  Per the POSIX specification,
  we support the optional BSD-style restartable system calls for signals
  which will avoid short reads; but this is only true if SA_RESTART is
  passed to sigaction().  Without SA_RESTART, we will indeed return
  short reads, as required by POSIX.
  
  I don't think Linus has said that short reads are always evil; I
  certainly can't remember him ever making that statement.  Do you have
  a pointer to a LKML message where he's said that?
 
 That's the last discussion about signals and I/O I can remember:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

He said 'disk read', not 'read(2)'.  I'd expect he means certain
things like stat(2) and readdir(2) when they have to go to disk.
read(2) explicitly lists EINTR as a valid result, and often folks use
signals to interrupt read(2).  The world certainly writes programs
to expect short read(2).

Joel

-- 

Gone to plant a weeping willow
 On the bank's green edge it will roll, roll, roll.
 Sing a lulaby beside the waters.
 Lovers come and go, the river roll, roll, rolls.

Joel Becker
Principal Software Developer
Oracle
E-mail: [EMAIL PROTECTED]
Phone: (650) 506-8127
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read/write counts

2007-06-04 Thread Theodore Tso
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
 That's the last discussion about signals and I/O I can remember:
 http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

Well, I think Linus was saying that we have to do both (where the
signal interrupts and where it doesn't), and I agree with that:

  There are enough reasons to discourage people from using uninterruptible
  sleep (this f*cking application won't die when the network goes down)
  that I don't think this is an issue. We need to handle both cases, and
   ^
  while we can expand on the two cases we have now, we can't remove them. 
  ^^^

Fortunately, although the -ERESTARTSYS framework is a little awkward
(and people can shoot arrows at me for creating it 15 year ago :-), we
do have a way of supporting both styles without _too_ much pain.

- Ted

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-04 Thread Andreas Gruenbacher
On Tuesday 15 May 2007 11:20, Pavel Machek wrote:
 Hi!
 
  Pathname matching, transition table loading, profile loading and
  manipulation.
 
 So we get small interpretter of state machines, and reason we need is
 is 'apparmor is misdesigned and works with paths when it should have
 worked with handles'.

I assume you mean labels instead of handles.

AppArmor's design is around paths not labels, and independent of whether or 
not you like AppArmor, this design leads to a useful security model distinct 
from the SELinux security model (which is useful in its own ways). The 
differences between those models cannot be argued away, neither is a subset 
of the other, and neither is a misdesign. I would be thankful if you could 
stop spreading this lie.

 If you solve the 'new file problem', aa becomes subset of selinux.
 And I'm pretty sure patch will be nicer than this. 

You are quite mistaken. SELinux turns pathnames into labels when it initially 
labels all files (when a policy is rolled out), whereas AppArmor computes 
the label of each file when a file is opened. The two models start to 
diverge as soon as files are renamed: in SELinux, labels stick with the 
files. In AppArmor, labels stick with the names.

So what you advocate for is a hybrid between the SELinux and the AppArmor 
model, not a superset.

It could be that the SELinux folks will solve the issues they are having with 
new files using something better than restorecond in the future, perhaps even 
an in-kernel mechanism (although I somewhat doubt it). But then again, their 
basic model makes sense even without any live file relabeling, and so that's 
probably not very high up on the priority list.

Andreas
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html