Re: [RFC] Add vfsmount to vfs helper functions.

2008-02-17 Thread Tetsuo Handa
Hello.

 No printable comments, except for that:
 
 (e) why don't you guys move the Linus' Serious Mistake to _callers_ of
 vfs_mknod() and its ilk?
 
 Which obviously solves all problems with having vfsmount.

Excuse me. I didn't understand what the Linus' Serious Mistake to
_callers_ of vfs_mknod() is. Could you give me some URLs or hints?

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Tetsuo Handa
Hello.



Indan Zupancic wrote:
  It seems to me that the alternatives you are proposing include
  modification of userland applications. But my assumption is
  that Don't require modification of userland applications.
 
 If you want a secure system it isn't that unreasonable to expect
 applications to not do brain dead things, so not requiring any
 modifications or config changes seems a bit optimistic to me.

It depends.
Some users have to continue using brain dead legacy applications
without modification because ...

   the application's source code is not available.

   the distributor no longer supports the application.

   the application is too difficult/complicated to reconstruct.

For cases where you can expect application won't do brain dead things
and/or we can reconstruct application, your approach is OK.



  In other words, I want to implement without asking applications
  to use /dev/dynamic/ or something.
  This filesystem is intended to provide support for legacy applications.
  (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and
  later.)
 
 Legacy applications should cope with a static /dev/.
 What is the advantage of your filesystem compared to a static /dev/?

I assume a static /dev/ means a /dev/ directory in 2.4 kernels.
This filesystem's advantage:

  (1) Can guarantee filename/attribute pairs.

  A process with root privilege can do
  mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp 
/dev/hda2
  if /dev is in / partition or is a devfs partition, whereas
  a process with root privilege cannot do
  mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp 
/dev/hda2
  if /dev is this filesystem unless granted by the configuration file.

  So, you can guarantee that /dev/hda1 is block-3-1 and /dev/hda2 is 
block-3-2 .
  (e.g. mount /dev/hda1 /home won't mount block-3-2 partition on /home .)

  (2) Can keep nodes that needn't to be deleted/modified for read-only.

  A process with root privilege can delete /dev/null on / partition or
  on devfs partition, whereas a process with root privilege cannot delete
  /dev/null on this filesystem unless granted by the configuration file.

  So, you can guarantee the node which needn't to be deleted/modified
  won't be deleted/modified.
  (e.g. /dev/null is always there with char-1-3 attribute.)

  (3) Can hide unwanted device nodes.

  A process with root privilege can create new nodes on / partition or on 
devfs,
  whereas a process with root privilege cannot create new nodes on this 
filesystem
  that are not specified by configuration file.

  So, you can expose specific nodes selectively.
  (e.g. Allow accessing /dev/hda1 , but forbid accessing /dev/hda2 .)



  Use of a tiny daemon that communicates with udev is not sufficient.
  The udev is not the only application that modifies /dev files.
 
 Oh, it isn't? Which other applications do modify /dev files? I'd like to
 hear about a few, no matter how obscure or proprietary. And please
 tell how many of those will stop working with a static /dev with all
 nodes they might create already existing.

I don't know. I'm not using rare software.



  At least, the tiny daemon should communicate with the kernel
  so that all requests are checked by the tiny daemon.
 
 No, why should the kernel be involved? The tiny daemon would be
 the only one allowed to modify /dev/, so all mknod commands will
 be done by it. Of course it means that you might need to modify
 the two or three apps wanting to create device nodes, or you can
 make an LD_PRELOAD lib that intercepts mknod commands and
 sends them to the daemon.

No. The kernel must be involved.

Suppose the tiny daemon is the only one allowed to modify /dev/ .
foo requests mknod /dev/null from chroot() environment.
bar requests mknod /dev/null from clone(CLONE_FS) + mount() environment.

How can the daemon know where to create the node?
How can the daemon determine whether the requested pathname is
in /dev directory or not?
The process who requests mknod and the process who performs mknod
are not always using the same / directory.
The daemon must not forbid creation of /dev/null if the realpath() is
/tmp/dev/null (i.e. mknod /dev/null after chroot /tmp),
because the daemon is not asked to manage /tmp/dev directory.

Who can guarantee that the daemon can access all namespaces?
The process who requests mknod and the process who performs mknod
are not always using the same namespace.

If foo or bar is a statically linked or suid-root application
(where LD_PRELOAD is ignored), they would attempt to create device nodes
directly (i.e. call sys_mknod() instead of communicating with the daemon)
and abort due to failure.
Not only applications who wants to create device nodes in /dev/ ,
but also all applications who wants to modify entries in /dev/ .


From the beginning, the kernel is deeply involved because in-kernel MAC
is essential 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-11 Thread Tetsuo Handa
Hello.



Indan Zupancic wrote:
 That only the tiny daemon can modify /dev/ is done with MAC rules,
 the ones that should be the default for all applications except udev by
 default already. For teh kernel nothing changes.

OK. You assume use of MAC with enough fine grained access control.



 Wrong. All nodes are created and thus there's never a need to create
 new nodes. So /dev/ can't be modified by anyone. This works because
 all nodes that anyone might want to create already exist.

Already exist is not enough.
These nodes have to be deletable if requested by appropriate process.
These nodes have to be protected by MAC from directly calling
mknod()/rename()/unlink()/link()/mount() etc.



 This is true on a theoretical level. But practically I think you can either
 run multiple daemons, one for each namespace where you want to control /dev/,

If the daemon does not exist in that namespace?

 or if you really want one daemon you can pass the
 directory fd to it where the node should be created and use mknodat().
 I believe that crosses namespaces correctly.

The fd passed to mknodat() is used for starting from
specified directory instead for current directory.
The object obtained by resolving the rest pathname depends on
the / of the calling process.

If /var/jail/dev/dyndev/link is a symlink to /dev ,
a process in chroot(/var/jail/) + chdir(/) will get /var/jail/dev/node
and a process not in chroot(/var/jail/) + chdir(/)  will get /dev/node
by resolving mknodat(fd_for_/var/jail/, dev/dyndev/link/node) .
If the process is in the chroot() but the daemon is not in the chroot() ,
the daemon will create nodes in a wrong location.

So, you let the LD_PRELOAD library to solve all directory components
before passing the fd to the daemon using UNIX domain socket
so that the daemon won't create nodes in a wrong location.

OK. It looks like working, although I'm not taking racy condition into account.



 But I think that the chance that any process needs to create device nodes
 in a chroot is at the level of fairy existance.

Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may
cause filename/attribute mismatching.

How can the daemon know whether the request is trying to manipulate nodes
in /dev directory or not?
If mount --bind /dev/ /var/dir/ is used, the daemon must check
filename/attribute pair when mknod(/var/dir/null) is requested
because permitting the request will modify /dev state.
If mount --bind /dev/ /var/dir/ is not used, the daemon must not check
filename/attribute pair when mknod(/var/dir/null) is requested
because permitting the request will not modify /dev state.



What does the daemon do? It receives requests from the LD_PRELOAD library
using UNIX domain socket and checks filename/attribute pair and issue
mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is 
appropriate?

What does the LD_PRELOAD library do? It intercepts all pathname related syscalls
(except open()) and solve directory component and determine whether the request 
is
trying to manipulate nodes in /dev direcrtory and forward request to the daemon
using UNIX domain socket?

Make the daemon and the LD_PRELOAD library bug-and-race free and
develop the MAC policy for the daemon and the LD_PRELOAD library
and Make this filesystem bug-and-race free. Which one is easier?



Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-09 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 Good point, but I assume they all have at least a directory granularity, and 
 then
 /dev/ can be static and udev and other can have free reign in e.g. 
 /dev/dynamic/.
 Just use subdirs for the dynamic stuff and this granularity problem is, with
 slight inconvenience, solved.

It seems to me that the alternatives you are proposing include modification of
userland applications. But my assumption is that
Don't require modification of userland applications.
In other words, I want to implement without asking applications
to use /dev/dynamic/ or something.
This filesystem is intended to provide support for legacy applications.
(In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and 
later.)



 Exploits are in code, and where that code is doesn't matter that much, either
 kernel or userspace, though if it's exploitable you'll rather not have it in 
 the
 kernel. So I think it's more secure if the checking would be done by udev than
 in a special filesystem, even if that means that you're screwed if udev is
 exploited. Of course you fully trust your own code, naturally.

I'm keeping the mechanism as simple as possible
so that there is unlikely room (e.g. buffer overflow) for running exploits.



 A tiny daemon that communicates with udev and does the checking you have
 now, and if ok it creates the node is really not much more code than your fs,
 so as hard to exploit too. Then if udev is hacked you have the same guarantee
 as you have now.

Use of a tiny daemon that communicates with udev is not sufficient.
The udev is not the only application that modifies /dev files.
At least, the tiny daemon should communicate with the kernel
so that all requests are checked by the tiny daemon.
But use of the tiny daemon (which is a process running in userland)
causes a lot of troubles.
See the block after the -- boundary -- of this posting.

My assumption is that Don't require userland process's assistance,
as written at Why not use FUSE?.



 Protecting certain files from being modified seems to me more generic than
 enforcing filename/attributes pairs on device nodes. 
OK. You are saying that from the point of view of what it can.
I thought you were saying enforcing filename/attributes pairs
from out-of-this-filesystem (e.g. MAC) is more flexible than this-filesystem.



 rm -f /dev/either-null-or-zero
 
 as said before, if this is possible then the MAC config used is wrong. Exactly
 the same as for your filesystem with
 
 mknod /dev/tmp1 c 1 X
 mount --bind /dev/tmp1 /dev/either-null-or-zero
 
 and you count on the MAC to prevent that.

An administrator asks MAC to prevent processes
(except specific processes who need to do rm -f /dev/either-null-or-zero)
from doing rm -f /dev/either-null-or-zero.

An administrator asks this filesystem to prevent processes from doing
mknod /dev/tmp1 c 1 X.

An administrator asks MAC to prevent processes from doing
mount --bind /dev/tmp1 /dev/either-null-or-zero.



 And as for that app, if you trust it to create device nodes, why don't you
 trust it to make the right nodes too?

If that app has a bug that triggers
  mknod /dev/either-null-or-zero 1$REPLY
instead of
  mknod /dev/either-null-or-zero $REPLY
under an unexpected circumstance, it will create unwanted nodes.
Thus I don't trust the app.



 If an administrator wants something else than
 3 or 5, you're breaking something.
That's the fate of white-list based access control.

Does this filesystem sound too strict to support dynamic device?
May be this filesystem should be able to permit creation of device nodes
that are not listed in the policy file.



  Can SELinux guarantee the same result as my filesystem even if udev or
  administrative programs have to be able to modify /dev ?
 
 More, because your filesystem doesn't guarantee anything at all on its own.
 But assuming the MAC is decent enough to protect your fs from being bypassed,
 I'm sure it can do what's needed fine without your fs. I can't answer for 
 SELinux
 because I don't know it well. But I trust it can protect files and/or
 directories, and that's all that's needed to achieve the same end result.

I don't know SELinux well, but as far as seeing an example
(found by Googling selinux allow mknod)

  allow udev_t self:capability { chown dac_override dac_read_search fowner 
fsetid sys_admin sys_nice mknod net_raw net_admin sys_rawio };

I can't find a place to specify filename/attributes pairs in this syntax.
So, if the process who is permitted to create device nodes misbehaves,
it will generate unexpected filename/attribute pairs.
I think SELinux can't guarantee the same result as my filesystem.



 You seem to assume that the in-kernel implementation is suddenly
 guaranteed bugfree.
I keep the implementation as simple as possible.



From your next posting:
 But I think doing more is getting ridiculous, because if a process can
 create a device node, it can also access it and do whatever harm could
 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.


Indan Zupancic wrote:
  I want to use this filesystem in case where a process with root privilege 
  was
  hijacked but the behavior of the hijacked process is still restricted by 
  MAC.
 
 1) If the behaviour can be controlled, why can't the process be
disallowed to change anything badly in /dev? Like disallowing anything
from modifying existing nodes that weren't created by that process.
That would have practically the same effect as your filesystem,
won't it?
MAC system can prevent hijacked processes from changing anything badly in /dev .
But MAC system can't prevent hijacked processes from doing
mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2
if permissions to rename device nodes in /dev are given to hijacked processes.
This is because MAC implementation doesn't check filename/attribute pairs.

But this filesystem can prevent hijacked processes from doing
mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2
even if permissions to rename device nodes in /dev are given to hijacked 
processes.

This filesystem is not designed to
forbid modifying nodes if that process needn't to modify nodes.
This filesystem is designed to
forbid breaking filename/attribute pairs of nodes
even if that process need to (or permitted to) modify nodes.

Or phrased differently, if the MAC system used can't protect /dev, it
won't be able to protect other directories either, and if it can't
protect e.g. my homedir, doesn't it make the whole MAC system
ineffective? And if the MAC system used is ineffective, your
filesystem is useless and you've bigger problems to fix.
You can use nodev mount option to prevent attackers from opening device files.
You can use MAC system to prevent attackers from mounting partitions (other than
/dev partition) without nodev option.


 2) The MAC system may not be able to guarantee certain combinations
of device names and properties, but isn't that policy that shouldn't
be in the kernel anyway? But if it is, shouldn't all device nodes be
checked? That is, shouldn't it be a global check instead of a filesystem
specific one?
I think the reason why MAC system doesn't handle filename/attributes pairs is 
that:

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

Thus, this should be a global check. But usually device nodes are only in /dev .



 3) Code efficiency. Thousand lines of code just to close one very specific
attack, which can be done in lots of different other ways that all need
to be prevented by the MAC system. (mounting over it, intercepting open
calls, duping the fd, etc.) Is it worth it?
This filesystem is doing what MAC system is not doing.
So, please don't complain about inability of this filesystem to close all 
attacks.
You can use MAC system to prevent attackers from mounting other filesystem
over this filesystem.

The filename/attribute pairs are something like system call entry tables.
The application will go wrong if __NR_read is mapped to sys_write() and
__NR_write is mapped to sys_read().
Userland applications access special functionalities (e.g. /dev/zero and 
/dev/random)
by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs
tamper-proof is important.

You recognize that there is a threat that device nodes may have irregular
attribute (e.g. /dev/null existing as a regular file), do you?
You don't deny implementing mechanisms somehow to avoid such threat, do you?
OK. Then the matter is the comparison of code efficiency.

This patch is less than 1100 lines in total.
Large part of this patch is for parsing and managing policy file.
If you try to extend every MAC implementation (SELinux, SMACK, AppArmor, TOMOYO)
so that they can handle filename/attributes pairs (i.e. expand policy file's 
syntax
and both in-kernel and userland data structures, manage strings with variant 
length
and non-printable characters etc.), I think that modification exceeds this 
patch.
I think guaranteeing filename/attribute pairs in filesystem layer can keep
MAC system implementation simple and compact.
http://www.mail-archive.com/linux-fsdevel@vger.kernel.org/msg10653.html


Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

[EMAIL PROTECTED] wrote:
 Ouch.  The .c files should generally be built into their own .o files and
 then the Makefile should do something like
 
 obj-$(CONFIG_SYAORAN) += syaoran.o
 
 unless there's *really* good reasons for including .c files (such as an
 otherwise-messy variable-namespace issue or similar).
Yes. The final implementation will become so.
This is a temporal hack to keep all functions and variables static.

 Also, has this been double-checked to Do The Right Thing if you have
 *two* instances of ramfs mounted, one with Syaoran and one without?
Yes. The memory for superblock is allocated for each instance.
Thus, mounting one as syaoran and the other as tmpfs won't cause problems.

 (incidentally, all of these should probably be abstracted into a helper
 function that's 'static inline' so we have just one #ifdef in the definition
 in a .h file, and none in open .c code).
Oh, good idea.

 Similarly for other places you have #ifdef CONFIG_ in ramfs .c code - see if
 you can abstract it out.
This patch replaces the previous patch and
this patch modifies only tmpfs (fs/shm*) files.
I'm no longer modifying ramfs (fs/ramfs/*) files.

  +/*
  + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
  + * Now I'm setting the field to share tmpfs/rootfs/syaoran code.
 
 Question for the audience: *should* ramfs set that field so setattr works
 on ramfs (even if it's just a stub similar to the SELinux fscontext= mount
 stuff)?
 
 Question for Tetsuo:  What happens to this code if somebody actually does the
 above change?
Please forget this question.
I'm no longer setting ramfs_dir_inode_operations.setattr field.

  + Applications using well-known device locations under /dev
  +  get the device they want (e.g. an application that accesses
  +  /dev/null can always get a character special device
  +  with major=1 and minor=3).
 
 This should say will always get, not can always, as this code will
 mandate, rather than just make possible.
OK.

  + The list of possible combinations of filename and its attributes
  + that can exist on this filesystem is defined at mount time
  + using a configuration file.
 
 The format of this file needs to be documented.
Yes. It is a line-by-line processable format defined as:

  filename permission owner group flags type [ symlink_data | major minor ]

where flags are bit-wised combinations of

  *  1: Allow creation of the file.
  *  2: Allow deletion of the file.
  *  4: Allow changing permissions of the file.
  *  8: Allow changing owner or group of the file.
  * 16: For internal use. Remembers whether this file is opened or not.
  * 32: Don't create this file at mount time.

and here are some example entries:

  pts 755 0   0   0   d
  shm 755 0   0   0   d
  fd  777 0   0   0   l   /proc/self/fd
  stdin   777 0   0   0   l   /proc/self/fd/0
  stdout  777 0   0   0   l   /proc/self/fd/1
  stderr  777 0   0   0   l   /proc/self/fd/2
  null666 0   0   0   c   1   3
  zero666 0   0   0   c   1   5
  random  644 0   0   0   c   1   8
  urandom 644 0   0   0   c   1   9
  tty 666 0   0   0   c   5   0
  tty0600 0   0   12  c   4   0
  cdrom   777 0   0   3   l   /dev/scd0
  console 600 0   0   1   c   5   1
  hda 660 0   6   0   b   3   0
  hda1660 0   6   0   b   3   1
  initctl 600 0   0   3   p
  log 666 0   0   15  s
  rtc 644 0   0   0   c   10  135
  ptmx666 0   0   0   c   5   2
  ram 777 0   0   3   l   /dev/ram0
  ram0660 0   6   0   b   1   0
  ram1660 0   6   0   b   1   1
  sda 660 0   6   0   b   8   0
  initrd  660 0   6   1   b   1   250

Full documentation of this filesystem is at
http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html

 I'm not terribly thrilled by
 the idea of passing a file to be read by the kernel, but I also understand
 that if it isn't done before mount, you have a race condition betweet the
 mount and the load.
What race condition is possible?
Are you worrying that the file gets modified while reading?

  Perhaps write some configfs code so that you can
 'mount /configfs; cat config.file  /configfs/syaoran; mount -t syaoran?
If you worry that the file gets modified while reading in kernel space,
you will also worry that the file gets modified while doing
cat config.file  /configfs/syaoran.

To use configfs (or whatever approach that is done before mount syscall),
some tag for 

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 I think you focus too much on your way of enforcing filename/attributes
 pairs.
So?

 The same can be achieved by creating the device nodes with
 expected attributes, and preventing processes from changing those files.
The device nodes have to be deletable if some process (including udev) needs to 
delete.
Thus, you cannot unconditionally prevent processes from changing those files.

 This because expected combinations are known beforehand.
Yes.

 And once those files are present, the MAC system used doesn't have to have 
 special
 device nodes attributes support. Protecting those files is enough to
 guarantee filename/attributes pairs.
If MAC system needn't to support this filesystem's functionality,
who creates those files with warrantee of expected attributes? The udev does?
If udev is exploited, who can guarantee?

 No, this is because rename permission was given for files that it shouldn't 
 had.
Do you think all MAC implementation have the same granularity and 
functionalities?
I don't think so. Not all MAC implementation can control with such granularity.
This filesystem is designed to be combined with any MAC,
although the MAC used with this filesystem should be able to restrict
namespace manipulation requests so that this filesystem can remain /dev
and visible to userland applications.

 Either you want a process to manage device names and attributes, and then you
 give it permission to do that, or you want to enforce certain 
 filename/attribute
 pairs and then you just do it yourself.
If I modify udev to enforce certain filename/attribute pairs and the modified 
udev
was exploited, who can guarantee?
Don't trust userland application is the basis of restricting access in kernel 
space.
If you can trust userland application, you don't need in-kernel access control.


 Will your filesystem prevent the trivial case of
 
 rm /dev/hda1
 ln -s /dev/hda2 /dev/hda1
 
Of course. To permit the above operation, the following permissions are needed.

  hda1660 0   6   2   b   3   1
  hda1777 0   0  33   l   .

 Rename permission can be given for /dev in general, but prohibited for
 certain files in /dev, the ones you want to have specific attributes.
 It isn't all or nothing.
Do you think all MAC implementation can prohibit renaming for certain files in 
/dev ?

 It's forbid modifying certain nodes that process needn't to modify
 versus forbid breaking filename/attribute pairs of certain nodes.
 
 Both have the same effect, except that the first one is generic and
 can be done by existing MAC systems, while the second one needs
 a special filesystem and a handful of MAC rules to make it effective.
Do you think all MAC implementation can do?
I think the first one is implementation specific and the second one is generic.

 It doesn't matter where they are, it's that a different fs than yours could be
 mounted over it. You say a MAC can prevent that from happening, but a
 MAC can also prevent all processes except for udev from modifying /dev.
But MAC cannot prevent udev from modifying /dev . And what if exploited?
Not all MAC can enforce access control over all processes with the granularity
you are talking. And what if a process that cannot be controlled with your
boolean level granularity exists (e.g. an administrator running his/her
administrative applications that require modification of /dev )?

A crazy example of administrative applications:
(Please don't say Don't use such crazy application.)

  #! /bin/sh
  rm -f /dev/either-null-or-zero
  read
  mknod /dev/either-null-or-zero c 1 $REPLY  echo Administrative task 
finished successfully. | mail root

This filesystem can guarantee /dev/either-null-or-zero is either char-1-3 or 
char-1-5 by using a policy

  either-null-or-zero666 0   0   3   c   1   3
  either-null-or-zero666 0   0  35   c   1   5

The boolean level granularity (e.g. forbid all processes except for udev ,
and modify udev to perform name/attribute pair enforcement) is not generic.
Userland application sometimes misbehaves.
I assume kernel process doesn't misbehave.
If you doubt my assumption, you have to doubt in-kernel MAC implementation too.

 I don't. What I complain about is that it's too specific and does it one 
 chosen
 job badly. It lacks abstraction. As far as I can see any decent MAC can 
 achieve
 the same end result as your filesystem, without directly enforcing name/attr
 pairs.
Can SELinux guarantee the same result as my filesystem even if udev or
administrative programs have to be able to modify /dev ?

 The thing is, all special device nodes that are expected to exist by 
 applications
 are known beforehand.
Yes.

 Thus they can be created statically and can be protected
 against any modifications with any MAC system.
But sometimes some modifications needs to be permitted.
Who can guarantee that there is no application (other than udev)

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-08 Thread Tetsuo Handa
Hello.

[EMAIL PROTECTED] wrote:
 Good summary - probably should add that to the patch, drop it into
 Documentation/syaoran-config.txt or similar...
I see.

 Modification while reading *is* an issue, but can probably be worked around
 with some clever locking.  The race condition I was thinking of was if you
 had the mount and the policy load be 2 separate events, you could see:
 
 (a) issue mount request
 (b) do something malicious in /dev while..
 (c) load the policy that would have prevented (b).
 
 This is partly why SELinux has init load the policy *very* early on, before
 any other userspace have had a chance to run and do things that would have
 been prevented by policy.
So, you suggested to load policy before mount() request so that
this filesystem can prevent attackers from doing something malicious
by minimizing (i.e. implement as non-blocking operation) the latency
between the userland process's call of mount() and the nodes become visible
to userland process.

I didn't take such cases into account.
My assumed usage of this filesystem is that run a script with

 #!/bin/sh
 mount -t syaoran -o accept=/etc/ccs/syaoran.conf none /dev
 exec /sbin/init $@

by passing init=/path/to/this/script to the kernel command line
so that /sbin/init can create /dev/initlog on this filesystem.
If you mount this filesystem after /sbin/init starts,
it will shadow /dev/initctl opened by /sbin/init .

 Which basically ends up meaning that anybody who can trick the mount into
 happening can reset the permitted list and create (for example) a mode 666
 entry for a hard drive, and go scribbling around at will.  Note that you
 don't seem to do any sanity checking on the path (for instance, that each
 component is owned by root, and not world-writable) - so anybody who finds
 a way to get the mount to happen can supply their own list in 
 /home/joeuser/blat
 or /tmp/surprise-mount-list  or wherever.
I assume that being able to reach this location means the caller of mount() is 
root.
But, the patches to allow mount() by non-root is in progress? 
http://lkml.org/lkml/2008/1/8/131
May be I should add some sanity checking on the path.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-06 Thread Tetsuo Handa
Hello.

Changes from previous posting:

 (1) I rebased this patch using tmpfs.

 I didn't know I was making this patch using ramfs...

This patch is for 2.6.24-rc6-mm1.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/Kconfig   |   18 +
 include/linux/shmem_fs.h |5 
 mm/shmem.c   |  124 +++
 mm/shmem_mac.h   |   57 +
 mm/shmem_mac_debug.c |  183 +
 mm/shmem_mac_init.c  |  486 +++
 mm/shmem_mac_main.c  |  205 +++
 7 files changed, 1077 insertions(+), 1 deletion(-)

--- linux-2.6-mm.orig/mm/shmem.c
+++ linux-2.6-mm/mm/shmem.c
@@ -736,11 +736,39 @@ static void shmem_truncate(struct inode 
shmem_truncate_range(inode, inode-i_size, (loff_t)-1);
 }
 
+#ifdef CONFIG_SYAORAN
+#include shmem_mac.h
+#include shmem_mac_init.c
+#include shmem_mac_main.c
+#include shmem_mac_debug.c
+
+static bool with_mac(struct super_block *sb)
+{
+   return sb-s_type == syaoran_fs_type;
+}
+#else
+static inline bool with_mac(struct super_block *sb)
+{
+   return 0;
+}
+#endif
+
 static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
 {
struct inode *inode = dentry-d_inode;
struct page *page = NULL;
int error;
+#ifdef CONFIG_SYAORAN
+   if (with_mac(inode-i_sb)) {
+   unsigned int flags = 0;
+   if (attr-ia_valid  (ATTR_UID | ATTR_GID))
+   flags |= MAY_CHOWN;
+   if (attr-ia_valid  ATTR_MODE)
+   flags |= MAY_CHMOD;
+   if (syaoran_may_modify_node(dentry, flags))
+   return -EPERM;
+   }
+#endif
 
if (S_ISREG(inode-i_mode)  (attr-ia_valid  ATTR_SIZE)) {
if (attr-ia_size  inode-i_size) {
@@ -1515,6 +1543,10 @@ shmem_get_inode(struct super_block *sb, 
default:
inode-i_op = shmem_special_inode_operations;
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+   if (with_mac(sb))
+   init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode-i_op = shmem_inode_operations;
@@ -1739,8 +1771,15 @@ static int shmem_statfs(struct dentry *d
 static int
 shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 {
-   struct inode *inode = shmem_get_inode(dir-i_sb, mode, dev);
+   struct inode *inode;
int error = -ENOSPC;
+#ifdef CONFIG_SYAORAN
+   if (with_mac(dir-i_sb)) {
+   if (syaoran_may_create_node(dentry, mode, dev)  0)
+   return -EPERM;
+   }
+#endif
+   inode = shmem_get_inode(dir-i_sb, mode, dev);
 
if (inode) {
error = security_inode_init_security(inode, dir, NULL, NULL,
@@ -1792,6 +1831,13 @@ static int shmem_link(struct dentry *old
 {
struct inode *inode = old_dentry-d_inode;
int ret;
+#ifdef

[PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Tetsuo Handa
Hello.

Changes from previous posting:

 (1) Added kernel config so that users can choose
 whether to compile this filesystem or not.

 I didn't receive any ACK/NACK regarding whether I'm permitted to
 implement this filesystem as an extension to tmpfs or not.
 So, I continued implementing this filesystem as an extension to tmpfs.

 (2) Removed indirect grabbing of blkdev_open() and chrdev_open().

 The previous posting was using indirect approach to call
 blkdev_open() and chrdev_open() so that users can compile
 this filesystem as a module without exporting blkdev_open()
 from fs/block_dev.c and chrdev_open() from fs/char_dev.c .
 But since tmpfs cannot be compiled as a module,
 I changed it to direct accessing.

 (3) Splitted single file into three files.

 syaoran_init.c:  initialization part
 syaoran_main.c:  access control part
 syaoran_debug.c: taking snapshot part

This patch is for 2.6.24-rc6-mm1.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/Kconfig   |   18 +
 fs/ramfs/inode.c |  177 ++
 fs/ramfs/syaoran.h   |   75 ++
 fs/ramfs/syaoran_debug.c |  183 +++
 fs/ramfs/syaoran_init.c  |  568 +++
 fs/ramfs/syaoran_main.c  |  207 +
 6 files changed, 1222 insertions(+), 6 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -36,6 +36,20 @@
 #include asm/uaccess.h
 #include internal.h
 
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+  dev_t dev, bool tmpfs_with_mac);
+
+#define TMPFS_WITH_MAC1
+#define TMPFS_WITHOUT_MAC 0
+#include linux/quotaops.h
+
+#ifdef CONFIG_SYAORAN
+#include syaoran.h
+#include syaoran_init.c
+#include syaoran_main.c
+#include syaoran_debug.c
+#endif
+
 /* some random number */
 #define RAMFS_MAGIC0x858458f6
 
@@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac
 
 struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
 {
+   return __ramfs_get_inode(sb, mode, dev, TMPFS_WITHOUT_MAC);
+}
+
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+  dev_t dev, const bool tmpfs_with_mac)
+{
struct inode * inode = new_inode(sb);
 
if (inode) {
@@ -65,10 +85,18 @@ struct inode *ramfs_get_inode(struct sup
switch (mode  S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+   if (tmpfs_with_mac)
+   init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode-i_op = ramfs_file_inode_operations;
inode-i_fop = ramfs_file_operations;
+#ifdef CONFIG_SYAORAN
+   if (tmpfs_with_mac)
+   init_syaoran_inode(inode, mode);
+#endif
break

Re: [PATCH][RFC] Simple tamper-proof device filesystem.

2008-01-05 Thread Tetsuo Handa
Hello.

Willy Tarreau wrote:
 Your patch is very confusing. In your description, as well as in the
 comments you talk about tmpfs, but your patch does not touch even one
 line of tmpfs and only changes ramfs. Even your variables and arguments
 refer to tmpfs. The Kconfig entry indicates that the feature depends
 on TMPFS too.
 
 Judging from the following comment :
   * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
 
 I suspect that you confuse both filesystems.
   - ramfs is in fs/ramfs and is always compiled in, you cannot disable it
   - tmpfs is in mm/shmem.c and is optional. It also supports options that
 ramfs does not (eg: size) and data may be swapped.
 
 Please understand that I'm not discussing the usefulness of your patch,
 I'm just trying to avoid a huge confusion.

Oh, I thought the filesystem mounted by mount -t tmpfs none /tmp is tmpfs
and the source code of tmpfs is located in fs/ramfs directory.
So, I should write the description as an extension to ramfs rather than
an extension to tmpfs.
I'll fix it in next posting.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-24 Thread Tetsuo Handa
Hello.

Serge E. Hallyn wrote:
 I apologize if I'm commiting a faux pas by asking this, but any chance
 of renaming this to something like strictdev or sdev, or at least with
 'dev' in it somewhere?

You are not commiting a faux pas. But, this naming is my personal feeling. ;-)
You can see the origin at http://I-love.SAKURA.ne.jp/tomoyo/index-en.html .

Happy Holidays!
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] Simple tamper-proof device filesystem.

2007-12-23 Thread Tetsuo Handa
Hello.

Thank you for attending discussion for previous posting
(starting from http://lkml.org/lkml/2007/12/16/23 ).

The previous posting was for feasibility test to know
whether this kind of trivial filesystem is acceptable for mainline.

Now, it seems that there is a little chance for accepting.
Therefore I rebased the patch using the -mm tree.

Regards.
--
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
applications using well-known device locations under /dev
get the device they want (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

  Because /dev has to be available through the lifetime of the kernel.
  It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

  Because SELinux doesn't guarantee filename and its attribute.
  As far as I know, no MAC implementation can handle filename and its attribute.
  I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

  I want to add functionality that the MACs are missing.
  Instead of adding this functionality per MAC,
  I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

  Dropping CAP_MKNOD is not enough for emulating this filesystem because
  a process can still rename()/unlink() to break filename and its attributes
  handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
  mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/ramfs/inode.c   |  101 -
 fs/ramfs/syaoran.h | 1066 +
 2 files changed, 1160 insertions(+), 7 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -35,6 +35,7 @@
 #include linux/sched.h
 #include asm/uaccess.h
 #include internal.h
+#include syaoran.h
 
 /* some random number */
 #define RAMFS_MAGIC0x858458f6
@@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac
  BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | 
BDI_CAP_EXEC_MAP,
 };
 
-struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev,
+   const int mac)
 {
struct inode * inode = new_inode(sb);
 
@@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup
switch (mode  S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+   if (mac) {
+   if (S_ISBLK(mode))
+   inode-i_fop = wrapped_def_blk_fops;
+   else if (S_ISCHR(mode))
+   inode-i_fop = wrapped_def_chr_fops;
+   inode-i_op = syaoran_file_inode_operations;
+   }
break;
case S_IFREG:
inode-i_op = ramfs_file_inode_operations;
inode-i_fop = ramfs_file_operations;
+   if (mac)
+   inode-i_op = syaoran_file_inode_operations;
break;
case S_IFDIR:
inode-i_op = ramfs_dir_inode_operations;
@@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup
break;
case S_IFLNK:
inode-i_op = page_symlink_inode_operations;
+   if (mac)
+   inode-i_op = syaoran_symlink_inode_operations;
break;
}
}
return inode;
 }
 
+struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+{
+   return __ramfs_get_inode(sb, mode, dev, 0);
+}
+
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup
 static int
 ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 {
-   struct inode * inode = ramfs_get_inode(dir-i_sb, mode, dev);
+   struct inode *inode;
int

Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-19 Thread Tetsuo Handa
Hello.

Radoslaw Szkodzinski (AstralStorm) wrote:
 Actually, who needs to create device nodes? Just prohibit everyone from
 creating them, except installer and udev personality.
 This means removing CAP_MKNOD on a global scale.

What happens if the root tampers udev's configuration file?
The udev will create inappropriate (i.e. filename with unexpected attributes)
device nodes, won't it?

Also, creating device nodes is not the only threat.
The root can do
# mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2
to rename/unlink device nodes.

After all, revoking CAP_MKNOD is not enough for guaranteeing
filename and its attributes.

This filesystem is designed to guarantee filename and its attributes,
but this filesystem has additional access control capability.
You can forbid mknod/unlink /dev/null if you want nobody to do so.
You can forbid chmod/chown /dev/null if you want nobody to do so.

Well... it is not fair to refer only udev's configuration file.
If configuration file of this filesystem is tampered,
this filesystem will create inappropriate device nodes.
So, some access control mechanism for protecting configuration files
is recommended for both udev and this filesystem.

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 If MAC can avoid all that, then why can't it also avoid tampering with /dev?

If MAC implementation handles filename and its attributes pair, this filesystem 
is not needed.
But I don't know MAC implementations that handle this pair.

SELinux's granularity is allow foo_t to create block device file in dev_t 
directory.
TOMOYO's granularity is allow foo to create block device file named /dev/sda1.
Both don't enforce filename and its attributes pair,
thus the attacker with root privilege can create fake device files
if he/she is permitted to create device files by MAC's policy.

It would be possible to handle this pair within MAC's policy
by expanding their policy syntaxes,
but offloading this handling on filesystem can make MAC's policy syntax simple
because filename and its attributes pairs are conventionally constant.
You won't let foo_t to create /dev/sda1 with block-8-1 attributes
and let bar_t to create /dev/sda1 with block-8-2 attributes, will you?
You don't want to describe attribute information to every entry in MAC's 
policy, do you?
It is redundant to describe this attribute enforcement information in MAC's 
policy
unless you want to break conventional filename and its attributes pairs.

 What security does your filesystem add at all, if it's useless without a MAC
 doing all the hard work?
Allow / partition to be mounted for read-only mode.
Allow /dev partition to be enforced filename and its attributes
to avoid /dev/null spoofing (create /dev/null as a regular file for 
eavesdropping purpose).

This filesystem adds filename and its attributes enforcement,
but it is overridable if this filesystem is used without MAC.
This filesystem adds unoverridable filename and its attributes enforcement
if this filesystem is used with MAC.

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
Hello.

Al Boldi wrote:
 I think the answer is obvious:  Tetsuo wants to add functionality that the
 MACs are missing.  So, instead of adding this functionality per MAC, he
 proposes to add it as ground work, to be combined with any MAC.
Yes, that's right.

This filesystem is designed to be used with TOMOYO Linux,
but this filesystem can be used with other MAC implementations too.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
( This is a reply to http://lkml.org/lkml/2007/12/17/27 .)

Hello.

David Wagner wrote:
 But the point is that it's not enough just to prevent attackers
 from mounting other filesystems over this filesystem.  I can think
 of all sorts of ways that an admin-level attacker might be able to
 prevent other administrators from logging in.  If your defense strategy
 involves trying to enumerate all of those possible ways and then shut
 them down one by one, you're relying upon a defense strategy known as
 blacklisting.  Blacklisting has a terrible track record in the
 security field, because it's too easy to overlook one pathway.
Of course, I assume whitelisting.
SELinux and TOMOYO Linux and many other MAC implementations uses
whitelisting approach, and this filesystem is whiltelisting approach.

This filesystem handles what MAC implementations don't handle.
In other words, it is a remaining hole.

I'm proposing:

 Don't you think it is dangerous to assume files in /dev directory
 have appropriate filename and attributes binding?
 MAC can restrict processes who can create files in /dev directory,
 but MAC doesn't enforce filename and attributes binding.
 So, how about enforcing filename and attributes binding in filesystem layer?

Regards.

To David Wagner:
  Could you please Cc: me so that I can reply to your message?
  I can't reply to your message since I'm reading this ml in daily digest mode.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
Hello.

Serge E. Hallyn wrote:
 CAP_MKNOD will be removed from its capability
I think it is not enough because the root can rename/unlink device files
(mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2).

 To use your approach, i guess we would have to use selinux (or tomoyo)
 to enforce that devices may only be created under /dev?
Everyone can use this filesystem alone.
But use with MAC (or whatever access control mechanisms that prevent
attackers from unmounting/overlaying this filesystem) is recomennded.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
Hello.

Serge E. Hallyn wrote:
 But your requirements are to ensure that an application accessing a
 device at a well-known location get what it expect.

Yes. That's the purpose of this filesystem.


 So then the main quesiton is still the one I think Al had asked - what
 keeps a rogue CAP_SYS_MOUNT process from doing
 mount --bind /dev/hda1 /dev/null ?

Excuse me, but I guess you meant mount --bind /dev/ /root/ or something
because mount operation requires directories.
MAC can prevent a rogue CAP_SYS_MOUNT process from doing
mount --bind /dev/ /root/.
For example, regarding TOMOYO Linux, you need to give
allow_mount /dev/ /root/ --bind 0 permission
to permit mount --bind /dev/ /root/ request.

Did you mean ln -s /dev/hda1 /dev/null or ln /dev/hda1 /dev/null?
No problem. MAC can prevent such requests too.

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-17 Thread Tetsuo Handa
Hello.

Serge E. Hallyn wrote:
 Nope, try
 
   touch /root/hda1
   ls -l /root/hda1
   mount --bind /dev/hda1 /root/hda1
   ls -l /root/hda1

[EMAIL PROTECTED] ~]# touch /root/hda1
[EMAIL PROTECTED] ~]# ls -l /root/hda1
-rw-r--r-- 1 root root 0 Dec 18 12:04 /root/hda1
[EMAIL PROTECTED] ~]# mount --bind /dev/hda1 /root/hda1
[EMAIL PROTECTED] ~]# ls -l /root/hda1
brw-r- 1 root disk 3, 1 Dec 18  2007 /root/hda1

Oh, surprising.
I didn't know mount() accepts non-directory for mount-point.
But I think this is not a mount operation
because I can't see the contents of /dev/hda1 through /root/hda1 .
Can I see the contents of /dev/hda1 through /root/hda1 ?


 Then it sounds like this filesystem is something Tomoyo can use.

I had / partition mounted for read-only so that the admin can't do
'mknod /root/hda1 b 3 1' in 2003, and I named it
Security Advancement Know-how Upon Readonly Approach for Linux or SAKURA 
Linux.
This filesystem (SYAORAN) is developed to make /dev writable and tamper-proof
when / partition is read-only or protected by MAC.
TOMOYO is a pathname-based MAC implementation, and
SAKURA and SYAORAN were merged into TOMOYO Linux. ;-)

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa
Hello.

I have proposed this filesystem a few years ago.
Once again, I'm proposing this filesystem toward inclusion into mainline.
I'll update for -mm tree if this filesystem is likely acceptable.

Regards.

(This is a resent message of [00/02] since it seems to be dropped.)
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa
A brief description about SYAORAN:

 SYAORAN stands for Simple Yet All-important Object Realizing Abiding
 Nexus. SYAORAN is a filesystem for /dev with Mandatory Access Control.

 /dev needs to be writable, but this means that files on /dev might be
 tampered with. SYAORAN can restrict combinations of (pathname, attribute)
 that the system can create. The attribute is one of directory, regular
 file, FIFO, UNIX domain socket, symbolic link, character or block device
 file with major/minor device numbers.

 SYAORAN can ensure /dev/null is a character device file with major=1 minor=3.

 Policy specifications for this filesystem is at
 http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html

Why not use FUSE?

 Because /dev has to be available through the lifetime of the kernel.
 It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

 Because SELinux doesn't guarantee filename and its attribute.
 The purpose of this filesystem is to ensure filename and its attribute
 (e.g. /dev/null is guaranteed to be a character device file
 with major=1 and minor=3).

Signed-off-by:  Tetsuo Handa [EMAIL PROTECTED]
---
 fs/syaoran/syaoran.c |  338 +
 fs/syaoran/syaoran.h |  964 +++
 2 files changed, 1302 insertions(+)

--- /dev/null
+++ linux-2.6.24-rc5/fs/syaoran/syaoran.c
@@ -0,0 +1,338 @@
+/*
+ * fs/syaoran/syaoran.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Portions Copyright (C) 2005-2007  NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre   2007/12/16
+ *
+ * This filesystem is developed using the ramfs implementation.
+ *
+ */
+/*
+ * Resizable simple ram filesystem for Linux.
+ *
+ * Copyright (C) 2000 Linus Torvalds.
+ *   2000 Transmeta Corp.
+ *
+ * Usage limits added by David Gibson, Linuxcare Australia.
+ * This file is released under the GPL.
+ */
+
+/*
+ * NOTE! This filesystem is probably most useful
+ * not as a real filesystem, but as an example of
+ * how virtual filesystems can be written.
+ *
+ * It doesn't get much simpler than this. Consider
+ * that this file implements the full semantics of
+ * a POSIX-compliant read-write filesystem.
+ *
+ * Note in particular how the filesystem does not
+ * need to implement any data structures of its own
+ * to keep track of the virtual data: using the VFS
+ * caches is sufficient.
+ */
+
+#include linux/module.h
+#include linux/fs.h
+#include linux/pagemap.h
+#include linux/highmem.h
+#include linux/time.h
+#include linux/init.h
+#include linux/string.h
+#include linux/backing-dev.h
+#include linux/sched.h
+#include linux/uaccess.h
+
+static struct super_operations syaoran_ops;
+static struct address_space_operations syaoran_aops;
+static struct inode_operations syaoran_file_inode_operations;
+static struct inode_operations syaoran_dir_inode_operations;
+static struct inode_operations syaoran_symlink_inode_operations;
+static struct file_operations syaoran_file_operations;
+
+static struct backing_dev_info syaoran_backing_dev_info = {
+   .ra_pages = 0,/* No readahead */
+   .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK |
+   BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY |
+   BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP,
+};
+
+#include syaoran.h
+
+static struct inode *syaoran_get_inode(struct super_block *sb, int mode,
+  dev_t dev)
+{
+   struct inode *inode = new_inode(sb);
+
+   if (inode) {
+   struct timespec now = CURRENT_TIME;
+   inode-i_mode = mode;
+   inode-i_uid = current-fsuid;
+   inode-i_gid = current-fsgid;
+   inode-i_blocks = 0;
+   inode-i_mapping-a_ops = syaoran_aops;
+   inode-i_mapping-backing_dev_info = syaoran_backing_dev_info;
+   inode-i_atime = now;
+   inode-i_mtime = now;
+   inode-i_ctime = now;
+   switch (mode  S_IFMT) {
+   default:
+   init_special_inode(inode, mode, dev);
+   if (S_ISBLK(mode))
+   inode-i_fop = wrapped_def_blk_fops;
+   else if (S_ISCHR(mode))
+   inode-i_fop = wrapped_def_chr_fops;
+   inode-i_op = syaoran_file_inode_operations;
+   break;
+   case S_IFREG:
+   inode-i_op = syaoran_file_inode_operations;
+   inode-i_fop = syaoran_file_operations;
+   break;
+   case S_IFDIR:
+   inode-i_op = syaoran_dir_inode_operations;
+   inode-i_fop = simple_dir_operations;
+   /*
+* directory inodes start off with i_nlink == 2
+*  (for . entry

[patch 2/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa

Signed-off-by: Tetsuo Handa [EMAIL PROTECTED]
---
 fs/Kconfig  |   21 +
 fs/Makefile |1 +
 2 files changed, 22 insertions(+)

--- linux-2.6.24-rc5.orig/fs/Kconfig
+++ linux-2.6.24-rc5/fs/Kconfig
@@ -1555,6 +1555,27 @@ config UFS_DEBUG
  Y here.  This will result in _many_ additional debugging messages to 
be
  written to the system log.
 
+config SYAORAN_FS
+   tristate SYAORAN (Tamper-Proof Device Filesystem) support
+   help
+ Say Y or M here to support the Tamper-Proof Device Filesystem.
+
+ SYAORAN stands for
+ Simple Yet All-important Object Realizing Abiding Nexus.
+ SYAORAN is a filesystem for /dev with Mandatory Access Control.
+
+ The system can't work if /dev is read-only.
+ Therefore you need to mount a writable filesystem (such as tmpfs)
+ for /dev if root fs is read-only.
+
+ But the writable /dev means that files on /dev might be tampered.
+ For example, if /dev/null is deleted and re-created as a symbolic
+ link to /dev/hda by an attacker, the contents of the IDE HDD
+ will be destroyed at a blow.
+
+ SYAORAN can ensure /dev/null is a character device file
+ with major=1 minor=3.
+
 endmenu
 
 menuconfig NETWORK_FILESYSTEMS
--- linux-2.6.24-rc5.orig/fs/Makefile
+++ linux-2.6.24-rc5/fs/Makefile
@@ -118,3 +118,4 @@ obj-$(CONFIG_HPPFS) += hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_GFS2_FS)   += gfs2/
+obj-$(CONFIG_SYAORAN_FS)+= syaoran/syaoran.o
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa
Hello.

David Newall wrote:
 Tetsuo Handa wrote:
   /dev needs to be writable, but this means that files on /dev might be
   tampered with.
 
 I infer that you mean /dev needs to be writable by anyone, not by just 
 its owner or owner and group (conventionally root/root.)  This goes 
 against conventional wisdom, which is that /dev must be writable only by 
 the administrator.  Why do you say otherwise?
I didn't mean that /dev is writable by everybody.
I meant that /dev must be mounted for read-write mode
(even if one wants to mount / for read-only mode).

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa
Hello.

  I meant that /dev must be mounted for read-write mode
 
 Again, why?

You can mount / partition for read-only mode if you wish to do so.
But you cannot make /dev directory for read-only.
You won't be able to login to the system because /sbin/mingetty
fails to chown/chmod /dev/tty* if /dev is mounted for read-only mode.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa

 But use of this filesystem is still valid when this filesystem is used with
 policy based mandatory access control (such as SELinux, TOMOYO Linux)
 because this filesystem guarantees where policy based mandatory access control
 can't guarantee (i.e. filename and its attribute).
 
Policy based mandatory access control guarantees that
Only Bob can create block device file named sda1 in /dev directory.
But it can't guarantee that /dev/sda1 will have block-8-1 attribute.
If Bob is malicious and creates /dev/sda1 with block-8-2 attribute,
other applications that depends on the attributes of /dev/sda1 goes wrong.
So, this filesystem guarantees that /dev/sda1 has block-8-1 attribute.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-16 Thread Tetsuo Handa
Hello.

Indan Zupancic wrote:
 What prevents them from mounting tmpfs on top of /dev, bypassing your fs?
Mandatory access control (MAC) prevents them from mounting tmpfs on top of /dev 
.
MAC mediates namespace manipulation requests such as mount()/umount().

 Also, if they have root there are plenty of ways to prevent an administrator
 from logging in, e.g. using iptables or changing the password.
MAC mediates execution of /sbin/iptables or /usr/bin/passwd .

So, use of this filesystem alone is meaningless because
attackers with root privileges can do what you are saying.
But use of this filesystem with MAC is still valid because
MAC can prevent attackers with root privileges from doing what you are saying.

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Tetsuo Handa
Hello.

Christoph Hellwig wrote:
  Isn't security_inode_create() a part of VFS internals?
 It's not.  security_inode_create is part of the LSM infrastructure, and
 the actual methods are part of security modules and definitively not
 VFS internals.
The reason why I want to access namespace_sem inside security_inode_create() is 
that
it doesn't receive struct vfsmount parameter.
If struct vfsmount *were* passed to security_inode_create(), 
I have no need to access namespace_sem.

And now, since calling down_read(namespace_sem) causes deadlock, I'm looking 
for a solution.
What you said (I'd start looking for design bugs in whatever code you have 
using it first.)
sounds never try to implement pathname based access control at 
security_inode_create(),
which makes AppArmor (for OpenSuSE 10.1/10.2) and TOMOYO unable to apply access 
control.

At first, I thought that this lockdep's warning is a false positive,
since struct inode is allocated/freed dynamically.
But the warning still appears even after I disabled freeing memory
at destroy_inode() in fs/namei.c (so that address of locking object
in struct inode never be reused), it is likely genuine.

Regards.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-07 Thread Tetsuo Handa
Hello.

Christoph Hellwig wrote:
 Same argument as with the AA folks: it does not have any business looking
 at the vfsmount.  If you create a file it can and in many setups will
 show up in multiple vfsmounts, so making decisions based on the particular
 one this creat happens through is wrong and actually dangerous.
Thus TOMOYO 1.x doesn't use LSM hooks, and AppArmor for OpenSuSE 10.3
added struct vfsmount parameter for VFS helper functions and LSM hooks.

Not all systems use bind mounts.
There is likely only one vfsmount which corresponds with a given dentry.

What does dangerous mean? It causes crash?

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with accessing namespace_sem from LSM.

2007-11-06 Thread Tetsuo Handa
Hello.

Christoph Hellwig wrote:
 Any code except VFS internals has no business using it at all and doesn't
 do that in mainline either.  I'd start looking for design bugs in whatever
 code you have using it first.
Isn't security_inode_create() a part of VFS internals?
I think security_inode_create() is a part of VFS internals
because it is called from vfs_create().

Regards.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is it illegal to refer namespace_sem while inode's mutex held?

2007-11-05 Thread Tetsuo Handa
Hello.

I'm running my LSM module on kernel 2.6.23 / Debian Sarge.
I encountered the following warning message.

It seems that calling down_read(namespace_sem) is not permitted
inside mutex_lock(inode-i_mutex) , but I'm not sure.
Is it illegal to refer namespace_sem while inode's mutex held?



===
[ INFO: possible circular locking dependency detected ]
2.6.23-tomoyo2.1 #27
---
rcS/1093 is trying to acquire lock:
 (namespace_sem){}, at: [c017ca7b] m_start+0x11/0x20

but task is already holding lock:
 (inode-i_mutex){--..}, at: [c0171e79] open_namei+0xf2/0x522

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #1 (inode-i_mutex){--..}:
   [c017d35b] graft_tree+0x62/0xca
   [c013ab37] check_prev_add+0xc4/0x1bc
   [c017d35b] graft_tree+0x62/0xca
   [c013ac85] check_prevs_add+0x56/0xcb
   [c013af9c] validate_chain+0x2a2/0x31f
   [c01312ec] __kernel_text_address+0x18/0x23
   [c0104b1b] dump_trace+0x6f/0x87
   [c013cc54] __lock_acquire+0x6f2/0x762
   [c013af6f] validate_chain+0x275/0x31f
   [c013d25e] lock_acquire+0x79/0x93
   [c017d35b] graft_tree+0x62/0xca
   [c0331518] __mutex_lock_slowpath+0xea/0x280
   [c017d35b] graft_tree+0x62/0xca
   [c017d35b] graft_tree+0x62/0xca
   [c017d8f1] do_add_mount+0x8a/0xe7
   [c017de52] do_mount+0x1a9/0x1c0
   [c0152d76] __alloc_pages+0x64/0x2b6
   [c017dc5f] copy_mount_options+0x4d/0x97
   [c017e0b5] sys_mount+0x79/0xb5
   [c01012f4] name_to_dev_t+0x4d/0x25d
   [c0331258] schedule_timeout+0x79/0x8d
   [c019b741] create_proc_entry+0x73/0x86
   [c012a023] process_timeout+0x0/0x5
   [c04648ff] kernel_init+0x0/0xa3
   [c0464e93] prepare_namespace+0x86/0x18e
   [c0168eb4] sys_access+0x1f/0x23
   [c0464998] kernel_init+0x99/0xa3
   [c0104aa3] kernel_thread_helper+0x7/0x10
   [] 0x

- #0 (namespace_sem){}:
   [c013aa9a] check_prev_add+0x27/0x1bc
   [c013ac85] check_prevs_add+0x56/0xcb
   [c013af9c] validate_chain+0x2a2/0x31f
   [c013cc54] __lock_acquire+0x6f2/0x762
   [c0179712] __d_lookup+0xda/0xfa
   [c013d25e] lock_acquire+0x79/0x93
   [c017ca7b] m_start+0x11/0x20
   [c013590f] down_read+0x3b/0x71
   [c017ca7b] m_start+0x11/0x20
   [c017ca7b] m_start+0x11/0x20
   [c01d5abe] tmy_do_single_write_perm+0x7e/0xda
   [c0171a74] vfs_create+0x83/0x105
   [c0171d44] open_namei_create+0x47/0x8a
   [c0171ee3] open_namei+0x15c/0x522
   [c01694d3] do_filp_open+0x25/0x39
   [c0332cd2] _spin_unlock+0x14/0x1c
   [c0169691] get_unused_fd_flags+0xb0/0xba
   [c0169764] do_sys_open+0x44/0xc5
   [c01697ff] sys_open+0x1a/0x1c
   [c0103e6a] syscall_call+0x7/0xb
   [] 0x

other info that might help us debug this:

1 lock held by rcS/1093:
 #0:  (inode-i_mutex){--..}, at: [c0171e79] open_namei+0xf2/0x522

stack backtrace:
 [c013a37f] print_circular_bug_tail+0x5f/0x67
 [c013aa9a] check_prev_add+0x27/0x1bc
 [c013ac85] check_prevs_add+0x56/0xcb
 [c013af9c] validate_chain+0x2a2/0x31f
 [c013cc54] __lock_acquire+0x6f2/0x762
 [c0179712] __d_lookup+0xda/0xfa
 [c013d25e] lock_acquire+0x79/0x93
 [c017ca7b] m_start+0x11/0x20
 [c013590f] down_read+0x3b/0x71
 [c017ca7b] m_start+0x11/0x20
 [c017ca7b] m_start+0x11/0x20
 [c01d5abe] tmy_do_single_write_perm+0x7e/0xda
 [c0171a74] vfs_create+0x83/0x105
 [c0171d44] open_namei_create+0x47/0x8a
 [c0171ee3] open_namei+0x15c/0x522
 [c01694d3] do_filp_open+0x25/0x39
 [c0332cd2] _spin_unlock+0x14/0x1c
 [c0169691] get_unused_fd_flags+0xb0/0xba
 [c0169764] do_sys_open+0x44/0xc5
 [c01697ff] sys_open+0x1a/0x1c
 [c0103e6a] syscall_call+0x7/0xb
 ===



The location is tmy_do_single_write_perm()
(whose call trace is open_namei() - open_namei_create() - 
security_inode_create())
in the following file
http://svn.sourceforge.jp/cgi-bin/viewcvs.cgi/trunk/2.1.x/tomoyo-lsm/patches/tomoyo-hooks.diff?rev=653root=tomoyoview=markup

Regards.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem with accessing namespace_sem from LSM.

2007-11-05 Thread Tetsuo Handa
Hello.

I found that accessing namespace_sem from security_inode_create()
causes lockdep warning when compiled with CONFIG_PROVE_LOCKING=y .



===
[ INFO: possible circular locking dependency detected ]
---
klogd/1798 is trying to acquire lock:
 (namespace_sem){}, at: [e0f133c7] _aa_perm_dentry+0x80/0x184 [apparmor]

but task is already holding lock:
 (inode-i_mutex){--..}, at: [c02a883e] mutex_lock+0x12/0x15

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #1 (inode-i_mutex){--..}:
   [c0137c89] lock_acquire+0x4b/0x6a
   [c02a86e6] __mutex_lock_slowpath+0xb0/0x1f6
   [c02a883e] mutex_lock+0x12/0x15
   [c0180b02] graft_tree+0x5c/0xd4
   [c0180e98] do_add_mount+0x84/0x100
   [c0181b5f] do_mount+0x602/0x659
   [c0181c1a] sys_mount+0x64/0x9b
   [c0103d9d] sysenter_past_esp+0x56/0x99

- #0 (namespace_sem){}:
   [c0137c89] lock_acquire+0x4b/0x6a
   [c0134e34] down_read+0x1e/0x31
   [e0f133c7] _aa_perm_dentry+0x80/0x184 [apparmor]
   [e0f14849] aa_perm_dentry+0x62/0xa4 [apparmor]
   [e0f167c7] apparmor_inode_create+0x40/0x63 [apparmor]
   [c01749e5] vfs_create+0x84/0x13e
   [c01774ec] open_namei+0x169/0x635
   [c0166f15] do_filp_open+0x20/0x36
   [c0166f6b] do_sys_open+0x40/0xbb
   [c0167012] sys_open+0x16/0x18
   [c0103d9d] sysenter_past_esp+0x56/0x99

other info that might help us debug this:

1 lock held by klogd/1798:
 #0:  (inode-i_mutex){--..}, at: [c02a883e] mutex_lock+0x12/0x15

stack backtrace:
 [c010555d] show_trace+0xd/0x10
 [c0105a99] dump_stack+0x19/0x1b
 [c0136dc8] print_circular_bug_tail+0x59/0x64
 [c01375bd] __lock_acquire+0x7ea/0x973
 [c0137c89] lock_acquire+0x4b/0x6a
 [c0134e34] down_read+0x1e/0x31
 [e0f133c7] _aa_perm_dentry+0x80/0x184 [apparmor]
 [e0f14849] aa_perm_dentry+0x62/0xa4 [apparmor]
 [e0f167c7] apparmor_inode_create+0x40/0x63 [apparmor]
 [c01749e5] vfs_create+0x84/0x13e
 [c01774ec] open_namei+0x169/0x635
 [c0166f15] do_filp_open+0x20/0x36
 [c0166f6b] do_sys_open+0x40/0xbb
 [c0167012] sys_open+0x16/0x18
 [c0103d9d] sysenter_past_esp+0x56/0x99



If this warning is true,
AppArmor shipped with OpenSuSE 10.1 and 10.2 is affected.

- Kernel 2.6.16.53-0.16 for OpenSuSE 10.1 -

do_add_mount() { /* in fs/namespace.c */
  down_write(namespace_sem);
  graft_tree() {
mutex_lock(nd-dentry-d_inode-i_mutex);
...
mutex_unlock(nd-dentry-d_inode-i_mutex);
  }
  up_write(namespace_sem);
}

open_namei() { /* in fs/namei.c */
  mutex_lock(dir-d_inode-i_mutex);
  vfs_create() {
security_inode_create() {
  subdomain_inode_create() { /* in security/apparmor/lsm.c */
sd_perm_dentry() { /* in security/apparmor/main.c */
  _sd_perm_dentry() {
sd_path_begin() { /* in security/apparmor/inline.h */
  sd_path_begin2() {
down_read(namespace_sem);
  }
}
...
sd_path_end() {
  up_read(namespace_sem);
}
  }
}
  }
}
  }
  mutex_unlock(dir-d_inode-i_mutex);
}

- Kernel 2.6.18.8-0.7 for OpenSuSE 10.2 -

do_add_mount() { /* in fs/namespace.c */
  down_write(namespace_sem);
  graft_tree() {
mutex_lock(nd-dentry-d_inode-i_mutex);
...
mutex_unlock(nd-dentry-d_inode-i_mutex);
  }
  up_write(namespace_sem);
}

open_namei() { /* in fs/namei.c */
  mutex_lock(dir-d_inode-i_mutex);
  vfs_create() {
security_inode_create() {
  apparmor_inode_create() { /* in security/apparmor/lsm.c */
aa_perm_dentry() { /* in security/apparmor/lsm.c */
  _aa_perm_dentry() {
aa_path_begin() { /* in security/apparmor/inline.h */
  aa_path_begin2() {
down_read(namespace_sem);
  }
}
...
aa_path_end() {
  up_read(namespace_sem);
}
  }
}
  }
}
  }
  mutex_unlock(dir-d_inode-i_mutex);
}

AppArmor shipped with OpenSuSE 10.3 and Ubuntu 7.10 will not be affected
since kernel was modified to pass vfsmount parameter
to VFS helper functions and LSM hooks.

TOMOYO Linux 2.x (which is implemented using LSM) is also affected
and I'm looking for solution.
http://lkml.org/lkml/2007/11/5/55

Possible solution would be to pass vfsmount parameter
to VFS helper functions and LSM hooks for all kernels.
I do hope that Pass struct vfsmount to ... patches
are merged into mainline kernel.

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24-rc1]EXPORT_SYMBOL(__set_page_dirty_no_writeback);

2007-10-25 Thread Tetsuo Handa
Hello.

Arjan van de Ven wrote:
 when will you post this filesystem for inclusion into kernel.org kernel?
 (and please really consider posting the patch together with that patch)
 (also, if you can give a pointer to the source code of this filesystem
 you might even get early code review)

I have proposed this filesystem at http://lkml.org/lkml/2004/11/1/48 .

In short, the filesystem I'm developing is a trivial device filesystem
that provides protection mechanism against tampering.

Reasons I don't use devfs/udev or fuse or LSM for /dev are:

  The devfs/udev don't provide protection mechanism against tampering.
  I don't know implementation that can enforce filename and it's attributes.
  Label based access control like SELinux doesn't distinguish
  /dev/sda1 and /dev/sda2, do they?
  If a process who is permitted to unlink and create /dev/sda1 and /dev/sda2 is 
cracked,
  who can ensure that /dev/sda1 is block-8-1 and /dev/sda2 is block-8-2?
  A situation /dev/sda1 is block-8-2 and /dev/sda2 is block-8-1 can happen.

  /dev has to be valid throughout the lifetime of system
  (i.e. from /sbin/init till power failure).
  Filesystems using fuse will freeze when a system starts /usr/bin/killall at 
shutdown script,
  where it is too early to stop working of /dev partition.

  LSM is used by SELinux, thus there is unlikely chance to call my module
  to validate a device file's filename and it's attributes.

The latest snapshot (which is not following codingstyle) is at
http://svn.sourceforge.jp/cgi-bin/viewcvs.cgi/*checkout*/trunk/1.5.x/ccs-patch/include/linux/syaoran.h?content-type=text%2Fplainrev=588root=tomoyo
http://svn.sourceforge.jp/cgi-bin/viewcvs.cgi/*checkout*/trunk/1.5.x/ccs-patch/fs/syaoran_2.6.c?content-type=text%2Fplainrev=614root=tomoyo

If there is a chance for inclusion into kernel.org kernel, I'm willing to fix 
codingstyle and submit immediately.

Thank you.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Does 32.1% non-contiguous mean severely fragmented?

2007-10-23 Thread Tetsuo Handa
Hello.

 What filesystem are you using?  ext3?  ext4?  xfs?  And are you using
 any non-standard patches, such as some of the delayed allocation
 patches that have been floating around?  If you're using ext3, that
 shouldn't be happening.
I'm using ext3.
I'm running it on kernel 2.6.18-8.1.14.el5 (CentOS 5) for x86_64.
I don't know whether some of the delayed allocation patches are used
for 2.6.18-8.1.14.el5 kernel.

 Are you sure the file isn't getting written by some background tasks
 that you weren't aware of?  This seems very strange; what
 virtualization software are you using?  VMware, Xen, KVM?
I'm using VMware Workstation 6.0.0 build 45731 for x86_64.
It seems that there were some background tasks that delays writing.
I tried the following sequence, sync didn't affect.

[EMAIL PROTECTED] Ubuntu7.10]# service vmware stop
[EMAIL PROTECTED] Ubuntu7.10]# sleep 30
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9280 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9280 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# service vmware start
[EMAIL PROTECTED] Ubuntu7.10]# vmware
[EMAIL PROTECTED] Ubuntu7.10]# service vmware stop
[EMAIL PROTECTED] Ubuntu7.10]# sleep 30
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9748 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9748 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# service vmware start
[EMAIL PROTECTED] Ubuntu7.10]# vmware
[EMAIL PROTECTED] Ubuntu7.10]# service vmware stop
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9749 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 9755 extents found, perfection would be 5 extents

Thank you.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Does 32.1% non-contiguous mean severely fragmented?

2007-10-22 Thread Tetsuo Handa
Hello.

Theodore Tso wrote:
 Secondly, what results do you get when you run the command hdparm -tT
 /dev/sda (or /dev/hda if you are using an IDE disk)?

[EMAIL PROTECTED] Ubuntu7.10]# hdparm -tT /dev/hda1 

/dev/hda1:
 Timing cached reads:   10384 MB in  2.00 seconds = 5196.44 MB/sec
 Timing buffered disk reads:  116 MB in  3.02 seconds =  38.36 MB/sec

[EMAIL PROTECTED] Ubuntu7.10]# hdparm -tT /dev/hda1

/dev/hda1:
 Timing cached reads:   10572 MB in  2.00 seconds = 5291.32 MB/sec
 Timing buffered disk reads:  118 MB in  3.04 seconds =  38.83 MB/sec

BIOS setting says it uses AHCI mode.

 First of all, what does the filefrag program (shipped as part of
 e2fsprogs, not included in some distributions) say if you run it as
 root on your VM data file?

Here is the result of filefrag. *-f???*.vmdk is splitted in 2 GB each.

[EMAIL PROTECTED] Ubuntu7.10]# filefrag *
Ubuntu7.10-0: 1 extent found
Ubuntu7.10-f001.vmdk: 151 extents found, perfection would be 18 extents
Ubuntu7.10-f002.vmdk: 36 extents found, perfection would be 18 extents
Ubuntu7.10-f003.vmdk: 5 extents found, perfection would be 1 extent
Ubuntu7.10.nvram: 1 extent found
Ubuntu7.10.vmdk: 1 extent found
Ubuntu7.10.vmsd: 1 extent found
Ubuntu7.10.vmx: 1 extent found
Ubuntu7.10.vmxf: 1 extent found
Ubuntu7.10.vmx.lck: Not a regular file
Ubuntu7-f001.10-0: 167 extents found, perfection would be 18 extents
Ubuntu7-f002.10-0: 68 extents found, perfection would be 18 extents
Ubuntu7-f003.10-0: 20 extents found, perfection would be 18 extents
Ubuntu7-f004.10-0: 93 extents found, perfection would be 18 extents
Ubuntu7-f005.10-0: 316 extents found, perfection would be 18 extents
Ubuntu7-f006.10-0: 27 extents found, perfection would be 18 extents
Ubuntu7-f007.10-0: 21 extents found, perfection would be 18 extents
Ubuntu7-f008.10-0: 20 extents found, perfection would be 18 extents
Ubuntu7-f009.10-0: 78 extents found, perfection would be 18 extents
Ubuntu7-f010.10-0: 22 extents found, perfection would be 18 extents
Ubuntu7-f011.10-0: 47 extents found, perfection would be 1 extent
vmware-0.log: 4 extents found, perfection would be 1 extent
vmware-1.log: 3 extents found, perfection would be 1 extent
vmware-2.log: 15 extents found, perfection would be 1 extent
vmware.log: 3 extents found, perfection would be 1 extent

Yes, there are some discontiguous, but the ratio is not so high when 
considering their file size.


Regarding 512MB-sized suspend image, it has more higher ratio of discontiguous, 
as shown below.

When I just power on and suspend at grub, the extent is smaller than perfection.
They would be sparse image (memory is allocated but not all memory is accessed).
But when I do some operation after login, it yeilds more discontiguous.


--- Start VM ---
--- Suspend VM ---
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 1 extent found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 14 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 14 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 17 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 17 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 17 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 17 extents found, perfection would be 5 extents
--- Resume and poweroff VM ---

--- Start VM ---
--- Suspend VM ---
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 751 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 3281 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 3281 extents found, perfection would be 5 extents
--- Resume and poweroff VM ---

What? sync yields more discontiguous?

--- Start VM ---
--- Suspend VM ---
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 10 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 482 extents found, perfection would be 5 extents
--- Resume and poweroff VM ---

--- Start VM ---
--- Suspend VM ---
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 8 extents found, perfection would be 5 extents
[EMAIL PROTECTED] Ubuntu7.10]# sync
[EMAIL PROTECTED] Ubuntu7.10]# filefrag Ubuntu7.10.vmem
Ubuntu7.10.vmem: 19 extents found, perfection would be 5 extents
--- Resume and poweroff VM ---

--- Start VM ---
--- Suspend VM ---
[EMAIL PROTECTED] Ubuntu7.10]# 

Re: Does 32.1% non-contiguous mean severely fragmented?

2007-10-19 Thread Tetsuo Handa
Hello.

Theodore Tso wrote:
 beginning of every single block group.  You have a small number of
 files on your system (349) occupying an average of 348 megabytes.  So
 it's not at all surprising that the contiguous percentage is 32%.
I see, thank you. Yes, there are many files splitted in 2GB each.

But what is surprising for me is that I have to wait for more than
five minutes to save/restore the virtual machine's 512MB-RAM image
(usually it takes less than five seconds).
Hdparm reports DMA is on and e2fsck reports no errors,
so I thought it is severely fragmented.
May be I should backup all virtual machine's data and
format the partition and restore them.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does \32.1% non-contigunous\ mean severely fragmented?

2007-10-18 Thread Tetsuo Handa
Hello.

I ran e2fsck and it reported as follows.

[EMAIL PROTECTED] ~]# e2fsck -f /dev/hda1
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/data/VMware: 349/19546112 files (32.1% non-contiguous), 31019203/39072080 
blocks

Does non-contiguous mean fragmented?
If so, where is ext3defrag?

Regards.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Tetsuo Handa
Hello.

Andreas Gruenbacher wrote:
  exec { /usr/bin/gunzip } gzip, -9, some/file/to.gz;
 The above Perl code executes /usr/bin/gunzip and sets argv[0] to gzip, so 
 this confirms that the value of argv[0] is arbitrary. Well great, we already 
 knew.

 AppArmor does not look at argv[0] for anything, and doing so would be insane. 
 So please don't jump to the wrong conclusions.
I agree that argv[0] checking is different from pathname-based access control
or label-based access control, but I want to say argv[0] checking is still 
needed.

If you don't check argv[0], an attacker can request everything like

exec { /bin/ls } /sbin/busybox, cat, /etc/shadow;
exec { /bin/ls } /sbin/busybox, rm, /etc/shadow;

if /bin/ls and /bin/cat and /bin/rm are hardlinks of /sbin/busybox (e.g. 
embedded systems).

Therefore, TOMOYO Linux checks the combination of filename and argv[0] passed 
to execve().

Thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Tetsuo Handa
Hello.

Andreas Gruenbacher wrote:
  Therefore, TOMOYO Linux checks the combination of filename and argv[0]
  passed to execve().
 So you are indeed trying to control the value of argv[0]? Well, good luck 
 with 
 that, but it's totally insane. You are guaranteed to break some applications.
TOMOYO Linux ristricts argv[0] using allow_argv0 syntax.
allow_argv0 /bin/bash -bash to allow passing /bin/bash to filename and 
-bash to argv[0] .
allow_argv0 /bin/gzip gunzip to allow passing /bin/gzip to filename and 
gunzip to argv[0] .
allow_argv0 /sbin/busybox cat to allow passing /sbin/busybox to filename 
and cat to argv[0] .
No need to use allow_argv0 syntax if the basename of filename and basename of 
argv[0] are the same
(i.e. allow_argv0 /bin/bash bash is not required).
TOMOYO Linux doesn't unconditionally forbid passing different values for 
filename and argv[0].
TOMOYO Linux allows passing different values for filename and argv[0] only if 
it is allowed by allow_argv0 syntax.
Could you please explain me why this approach breaks applications?

 If /bin/cat and /bin/rm are binaries or hardlinks to the same busybox binary 
 (rather than symlinks), different profiles could be used for each of them.
It is true if all processes are kept under control (e.g. strict policy in 
SELinux).
If there is a process that is not kept under control (e.g. targeted policy in 
SELinux),
you can't protect the application.
For example, an administrator may wish to allow users run /bin/ls without 
applying profiles
because /bin/ls won't read/write the content of files. But a malicious user may 
pass
/bin/ls to filename and rm to argv[0] and /etc/shadow to argv[1].
A malicious user may pass /bin/ls to filename and /usr/sbin/httpd to 
argv[0],
resulting behave as /usr/sbin/httpd without applying profiles for 
/usr/sbin/httpd .

Thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-25 Thread Tetsuo Handa
Hello.

Casey Schaufler wrote:
 Sorry, but I don't understand your objection. If AppArmor is configured
 to allow everyone access to /bin/gzip but only some people access to
 /bin/gunzip and (important detail) the single binary uses argv[0]
 as documented and (another important detail) there aren't other links
 named gunzip to the binary (ok, that's lots of if's) you should be fine.

The argv[0] defines the default behavior of hard linked or symbolic linked 
programs,
but the behavior can be overridden using commandline options.
If you want to allow access to /bin/gzip but deny access to /bin/gunzip ,
you also need to deny access to /bin/gzip -d /bin/gzip --decompress 
/bin/gzip --uncompress.
It is impossible to do so because options to override the default behavior
depends on program's design and you can't know
what programs and what options are there in the system.
Even if you know all programs and all options in the system,
it is a too tough job to find and reject options
that override the default behavior in the kernel space.

  Well, my point was exactly that App Armor doesn't (as far as I know) do
  anything to enforce the argv[0] convention,
 Sounds like an opportunity for improvement then.

There are (I think) three types of program invocation.

(1) Invocation of hard linked programs.

/bin/gzip and /bin/gunzip and /bin/zcat are hard links.

There is no problem because you can know which pathname was requested
using d_namespace_path() with struct linux_binprm-file .

(2) Invocation of symbolic linked programs.

/sbin/pidof is a symbolic link to /sbin/killall .

There is a problem because you can't know which pathname was requested
using d_namespace_path() with struct linux_binprm-file
because the symbolic links were already derefernced inside open_exec().
To know which pathname was requested, you need to lookup
using struct linux_binprm-filename without LOOKUP_FOLLOW
and then use d_namespace_path().
Although there is a race condition that the pathname
the symbolic link struct linux_binprm-filename points to
may change, but it is inevitable because you can't get
dentry and vfsmount of both without LOOKUP_FOLLOW flag and
with LOOKUP_FOLLOW flag at the same time.

(3) Invocation of dynamically created programs with random names.

/usr/sbin/logrotate creates files patterned /tmp/logrotate.??
and executes these dynamically created files.

To keep execution of these dynamically created files under control,
you need to aggregate pathnames of these files.
AppArmor can't define profile if the pathname of programs is random, can it?

Usually the argv[0] and the struct linux_binprm-filename are the same,
but if you want to do something with argv[0], you will need to handle the (2) 
case
to see whether the argv[0] and struct linux_binprm-filename are the same.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSMhook

2007-05-24 Thread Tetsuo Handa
Hello.

I think bind mounts were discussed when shared subtree
( http://lwn.net/Articles/159092/ ) was introduced.

For systems that allow users mount their CD/DVDs freely,
bind mounts are used and labeling files is a convenient way
to deny accessing somebody else's files.

But systems that don't allow users mount their CD/DVDs freely,
bind mounts needn't to be used and using pathnames is a convenient way
to deny accessing somebody else's files.

Pathname based access control/auditing system
works if the system doesn't use bind mounts.

However, there are distributions (e.g. Debian Etch)
that always use bind mounts. In such distributions,
pathname based access control/auditing system doesn't work.

This is not the fault of distributions nor
pathname based access control/auditing system.
It is possible to solve by passing vfsmount to VFS and LSM functions.

SELinux users are having a lot of trouble because pathnames in audit logs
are not always complete.
AppArmor users are having a lot of trouble because pathnames which
a process requested are ambiguous when bind mounts are used.

Being able to report pathnames that a process requested is not surprising
when considering user friendliness.
I beleive passing vfsmount makes both users happy.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [d_path 3/7] Add d_namespace_path() to compute namespace relative pathnames

2007-04-21 Thread Tetsuo Handa
Hello.

I've just returned from ELC2007 and
I haven't read all posts in this thread yet,
but I want to comment to this function.

 In AppArmor, we are interested in pathnames relative to the namespace root.
 This is the same as d_path() except for the root where the search ends. Add
 a function for computing the namespace-relative path.
Yes. You came to the same conclusion as TOMOYO Linux does.
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/realpath.c#L39


TOMOYO Linux uses pathnames relative to the namespace root.
You do this using d_path()'s way, but there needs some extensions
if you want to use d_namespace_path() for access control/auditing purpose.

In Linux, all characters other than NULL can be used in its pathname.
This means that you can't assume that whitespaces are delimiters.
For example, when you process entries in Access . granted/rejected\n 
format
(where . is a pathname and \n is a carriage return, like Access /bin/ls 
granted\n),
an entry Access /bin/ls granted\nAccess /bin/cat granted\n can be produced
if . is /bin/ls granted\nAccess /bin/cat.
Processing such entry will produce wrong result.

Also, you want wildcards (usually *) when doing pathname comparison,
but there are files that contains wildcards
(for example, /usr/share/guile/1.6/ice-9/and-let*.scm in CentOS 4.4).
You need to escape so that you can tell whether * indicates
a literal * or a wildcard.

Also, in non-English regions, characters that are out of ASCII printable range
are included in its pathname (for example, files created via Samba from Windows 
client).
Some programs can't handle characters that have MSB bit on,
so you may want to represent all characters without using MSB bit.

It may be OK if you use d_namespace_path() for processing a userland's 
configuration file,
but it is not OK if you use it for processing a kernel's configuration file.
The kernel has to be able to handle any characters.

So, you may want customized version of d_namespace_path()?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/28] Patches to pass vfsmount to LSM inode security hooks

2007-02-06 Thread Tetsuo Handa
Tony Jones wrote:
 The following are a set of patches the goal of which is to pass vfsmounts
 through select portions of the VFS layer sufficient to be visible to the LSM
 inode operation hooks.
I was looking forward to these patches for so long.

Chris Wright wrote:
 This kind of change (or perhaps
 straight to struct path) is definitely
 needed from AA.
Not only AppArmor, but also TOMOYO Linux needs these patches.

TOMOYO Linux is a pathname based access control patch like AppArmor.
http://lwn.net/Articles/165132/
I have been asked Why not use LSM? and the answer is always
I can't, for VFS helper functions and LSM functions don't receive vfsmount.
and I am manually patching locations that call VFS helper functions.

But if these Tony's patches are accepted in upstream,
TOMOYO Linux would be able to use LSM.
I think these patches are also useful for auditing functions, for
auditing logs will be able to include absolute pathname
instead of partial pathname.
I think most people want access logs in the form of pathnames
rather than security labels.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html