Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
No specific spec, just general quality of implementation. I completely agree. If one thread writes A and another writes B then the kernel should record either A or B, not ((A 0x) | (B 0x)) Agree entirely: the spec doesn't allow for random scribbling in the wrong

Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
On Mon, 28 Jan 2008 15:10:34 +0100 Andi Kleen [EMAIL PROTECTED] wrote: On Monday 28 January 2008 14:38:57 Alan Cox wrote: Also worse really fixing it would be a major change to the VFS because of the way -read/write are defined :/ I don't see a problem there. -read and -write update

Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
Also worse really fixing it would be a major change to the VFS because of the way -read/write are defined :/ I don't see a problem there. -read and -write update the passed pointer which is not the real f_pos anyway. Just the copies need fixing. Alan - To unsubscribe from this list: send the

Re: [RFC] Parallelize IO for e2fsck

2008-01-22 Thread Alan Cox
I'd tried to advocate SIGDANGER some years ago as well, but none of the kernel maintainers were interested. It definitely makes sense to have some sort of mechanism like this. At the time I first brought it up it was in conjunction with Netscape using too much cache on some system, but it

Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Alan Cox
Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback

Re: [RFD] Incremental fsck

2008-01-13 Thread Alan Cox
What are ext3 expectations of disk (is there doc somewhere)? For example... if disk does not lie, but powerfail during write damages the sector -- is ext3 still going to work properly? Nope. However the few disks that did this rapidly got firmware updates because there are other OS's that

Re: [PATCH, RESEND] locks: fix possible infinite loop in posix deadlock detection

2007-10-30 Thread Alan Cox
the problem. Cc: George G. Davis [EMAIL PROTECTED] Signed-off-by: J. Bruce Fields [EMAIL PROTECTED] Acked-by: Alan Cox [EMAIL PROTECTED] Its a good fix for now and I doubt any real world user has that complex a locking pattern to break. - To unsubscribe from this list: send the line unsubscribe

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-29 Thread Alan Cox
On Sun, 28 Oct 2007 13:43:21 -0400 J. Bruce Fields [EMAIL PROTECTED] wrote: From: J. Bruce Fields [EMAIL PROTECTED] We currently attempt to return -EDEALK to blocking fcntl() file locking requests that would create a cycle in the graph of tasks waiting on locks. This is inefficient: in

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-29 Thread Alan Cox
And if posix file locks are to be useful to threaded applications, then we have to preserve the same no-false-positives requirement for them as well. It isn't useful to threaded applications. The specification requires this. Which is another reason for having an additional Linux (for now) flag

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Alan Cox
On Sun, 28 Oct 2007 12:27:32 -0600 Matthew Wilcox [EMAIL PROTECTED] wrote: On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote: We currently attempt to return -EDEALK to blocking fcntl() file locking requests that would create a cycle in the graph of tasks waiting on locks.

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Alan Cox
- EDEADLK behaviour is ABI Not in any meaningful way. I've seen SYS5 software that relies on it so we should be careful. Again see the 2004 discussion where the conclusion was that EDEADLK should stay - EDEADLK behaviour is required by SuSv3 What SuSv3 actually says is: If

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Alan Cox
Bzzt. You get a false deadlock with multiple threads like so: Thread A of task B takes lock 1 Thread C of task D takes lock 2 Thread C of task D blocks on lock 1 Thread E of task B blocks on lock 2 The spec and SYSV certainly ignore threading in this situation and you know that perfectly

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Alan Cox
The spec and SYSV certainly ignore threading in this situation and you know that perfectly well (or did in 2004) The discussion petered out (or that mailing list archive lost articles from the thread) without any kind of resolution, or indeed interest. I think the resolution was that the

Re: [RFC, PATCH] locks: remove posix deadlock detection

2007-10-28 Thread Alan Cox
On Sun, 28 Oct 2007 17:38:14 -0600 Matthew Wilcox [EMAIL PROTECTED] wrote: On Sun, Oct 28, 2007 at 09:38:55PM +, Alan Cox wrote: It doesn't require the system to detect it, only mandate what to return if it does detect it. We should be detecting at least the obvious case. What

Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-25 Thread Alan Cox
On Thu, 25 Oct 2007 16:06:40 -0700 Mike Waychison [EMAIL PROTECTED] wrote: Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor. It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a

Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-25 Thread Alan Cox
I found Chris's comment about negative block numbers, I'll send a patch out for that. You mentioned back in 99 about racing with ftruncate. Is it sufficient to mutex_lock(i_mutex) and down_read(i_alloc_sem)? One for the fs guys. That code has changed far beyond anything I understand any

Re: [patch 1/2] getattr - fill the size of pipes

2007-10-04 Thread Alan Cox
Cute feature, but it is (I assume) a Linux-specific extension and is something which we'll need to maintain for ever and it invites Actually it used to work on the old old Linux pipe code. unportability to older Linuxes and other OSes and it introduces some risk of breakage of existing

[PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Alan Cox
therefore transition to the proper error return code Signed-off-by: Alan Cox [EMAIL PROTECTED] diff -u --new-file --exclude-from /usr/src/exclude --recursive linux.vanilla-2.6.23rc8-mm1/fs/gfs2/ops_file.c linux-2.6.23rc8-mm1/fs/gfs2/ops_file.c --- linux.vanilla-2.6.23rc8-mm1/fs/gfs2/ops_file.c

Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Alan Cox
On Thu, 27 Sep 2007 07:01:18 -0700 Arjan van de Ven [EMAIL PROTECTED] wrote: On Thu, 27 Sep 2007 14:29:19 +0100 Alan Cox [EMAIL PROTECTED] wrote: The early LFS work that Linux uses favours EFBIG in various places. SuSv3 specifically uses EOVERFLOW for this as noted by Michael (Bug 7253

Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Alan Cox
Its a change of a specific error return from the wrong error to the right one, nothing more. Fixing the returned error gives us correct behaviour according to the standards and other systems. It may still break applications. Waving some standard at them if they complain is unlikely to

Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Alan Cox
Well it's not my call, just seems like a really bad idea to change the error value. You can't claim full coverage for such testing anyway, it's one of those things that people will complain about two releases later saying it broke app foo. Strange since we've spent years changing error values

Re: [patch 1/2] VFS: new fgetattr() file operation

2007-09-24 Thread Alan Cox
But it's has various dawbacks, like rmdir doesn't work if there are open files within an otherwise empty directory. I'd happily accept suggestions on how to deal with this differenty. NFS has that problem because it really has to sillyrename into the same directory. I don't see that ssh/sftp

Re: [AppArmor 00/44] AppArmor security module overview

2007-06-28 Thread Alan Cox
Anyone can apply the apparmour patch to their tree, they get the choice that way. Nobody is currently prevented from using apparmour if they want to, any such suggestion is pure rubbish. The exact same argument was made prior to SELinux going upstream. Its made for every thing before it

Re: Versioning file system

2007-06-16 Thread Alan Cox
http://www.wipo.int/pctdb/en/fetch.jsp?LANG=ENGDBSELECT=PCTSERVER_TYPE=19SORT=1211506-KEYTYPE_FIELD=256IDB=0IDOC=1205953C=10ELEMENT_SET=IA,WO,TTL-ENRESULT=1TOTAL=3START=1DISP=25FORM=SEP-0/HITNUM,B-ENG,DP,MC,PA,ABSUM-ENGSEARCH_IA=US2005045566QUERY=%28IN%2fmerkey%29+ The last one was filed with

Re: Versioning file system

2007-06-16 Thread Alan Cox
(Vax/VMS System Software Handbook) (TOPS-20 User's Manual) Also Files/11 Basic versioning goes back to at least ITS Not sure how old doing file versioning and hiding it away with a tool to go rescue the stuff you blew away by mistake is, but Novell Netware 3 certainly did a good job on

Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Alan Cox
As such, AA can detect whether you did exec(gzip) or exec(gunzip) and apply the policy relevant to the program. It could apply different That's not actually useful for programs which link the same binary to multiple names because if you don't consider argv[0] as well I can run /usr/bin/gzip

Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Alan Cox
Preventive measures are taken to limit only one continuation inode per file per chunk. This can be done easily in the chunk allocation algorithm for disk space. Although I'm not quite sure what you mean by How are you handling the allocation in this situation, are you assuming that a chunk

Re: [d_path 1/7] Fix __d_path() for lazy unmounts and make it unambiguous

2007-04-20 Thread Alan Cox
On Fri, 20 Apr 2007 01:23:04 +0200 Andreas Gruenbacher [EMAIL PROTECTED] wrote: First, when __d_path() hits a lazily unmounted mount point, it tries to prepend the name of the lazily unmounted dentry to the path name. It gets this wrong, and also overwrites the slash that separates the name

Re: [d_path 6/7] Filter out disconnected paths from /proc/mounts

2007-04-20 Thread Alan Cox
Gruenbacher [EMAIL PROTECTED] This change in behaviour appears to be fine for glibc (except when trying to find the name of a file from a namespace we are not in, which wouldn't have come out right before either) Acked-by: Alan Cox [EMAIL PROTECTED] (but still NAK on the getcwd change) - To unsubscribe

Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts

2007-04-16 Thread Alan Cox
That is a fairly significant and sudden change to the existing kernel/user interface. Well, this is not meant for 2.6.21. I hope it is possible to change it in early 2.6.22; otherwise if we can't fix mistakes from the past we are pretty doomed. I don't believe the existing behaviour

Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching

2007-04-16 Thread Alan Cox
don't actually have to care --- if loading an invalid profile can bring down the system, then that's no worse than an arbitrary module that crashes the machine. Not sure if there will ever be user loadable profiles; at least at that point we had to care. CAP_SYS_RAWIO is needed to do

Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts

2007-04-12 Thread Alan Cox
Third, sys_getcwd() shouldn't return disconnected paths. The patch checks for that, and makes it fail with -ENOENT in that case That is a fairly significant and sudden change to the existing kernel/user interface. Fourth, this now allows us to tell unreachable mount points from reachable

Re: [AppArmor 03/41] Remove redundant check from proc_sys_setattr()

2007-04-12 Thread Alan Cox
On Thu, 12 Apr 2007 02:08:12 -0700 [EMAIL PROTECTED] wrote: notify_change() already calls security_inode_setattr() before calling iop-setattr. This is a behaviour change on all of these and limits some behaviour of existing established security modules When inode_change_ok is called it has

Re: [AppArmor 38/41] AppArmor: Module and LSM hooks

2007-04-12 Thread Alan Cox
+ + /** + * parent can ptrace child when + * - parent is unconfined + * - parent is in complain mode + * - parent and child are confined by the same profile + */ Your profiles are name based. That means the same profile in a different namespace does different

Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching

2007-04-12 Thread Alan Cox
+ th.td_id = ntohs(*(u16 *) (blob)); + th.td_flags = ntohs(*(u16 *) (blob + 2)); + th.td_lolen = ntohl(*(u32 *) (blob + 8)); Use cpu_to and _to_cpu functions for here so it is clear the intended direction and endianness. + +static inline int aa_inbounds(struct aa_ext *e, size_t

Re: [AppArmor 37/41] AppArmor: Main Part

2007-04-12 Thread Alan Cox
+ * aa_taskattr_access + * @name: name of the file to check + * + * Check if name matches /proc/self/attr/current, with self resolved + * to the current pid. This file is the usermode iterface for + * changing one's hat. + */ +static inline int aa_taskattr_access(const char *name) +{ +

Re: impact of 4k sector size on the IO FS stack

2007-03-12 Thread Alan Cox
Now, if this disk was copied byte per byte (/bin/dd) to a 4096-based disk, and Linux would start using a sector size of 4096, then I would suddenly have The ATA drives I'm aware of report 512 byte sector size, do 512 byte I/O's but use 4K physical sectors and to get sane performance except the

Re: impact of 4k sector size on the IO FS stack

2007-03-12 Thread Alan Cox
First generation of 1K sector drives will continue to use the same 512-byte ATA sector size you are familiar with. A single 512-byte write will cause the drive to perform a read-modify-write cycle. This configuration is physical 1K sector, logical 512b sector. The problem case is

Re: impact of 4k sector size on the IO FS stack

2007-03-12 Thread Alan Cox
For 1K/4K logical sector sizes, who knows. EFI? grins and runs Certainly seems incompatible with the current popular DOS partition format. Its a bit messier than that. There are two interpretations of DOS partition formats found on 2K sector size magneto opticals. One is that everything is

Re: impact of 4k sector size on the IO FS stack

2007-03-11 Thread Alan Cox
Are there other concerns in the IO or FS stack that we should bring up with vendors? I have been asked to summarize the impact of 4k sectors on linux for a disk vendor gathering and want to make sure that I put all of our linux specific items into that summary... We need to make sure the

Re: GFS, what's remainingh

2005-09-06 Thread Alan Cox
On Maw, 2005-09-06 at 02:48 -0400, Daniel Phillips wrote: On Tuesday 06 September 2005 01:05, Dmitry Torokhov wrote: do you think it is a bit premature to dismiss something even without ever seeing the code? You told me you are using a dlm for a single-node application, is there anything

Re: [Linux-cluster] Re: GFS, what's remaining

2005-09-05 Thread Alan Cox
On Sad, 2005-09-03 at 21:46 -0700, Andrew Morton wrote: Actually I think it's rather sick. Taking O_NONBLOCK and making it a lock-manager trylock because they're kinda-sorta-similar-sounding? Spare me. O_NONBLOCK means open this file in nonblocking mode, not attempt to acquire a clustered

Re: GFS, what's remaining

2005-09-01 Thread Alan Cox
On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote: - Why the kernel needs two clustered fileystems So delete reiserfs4, FAT, VFAT, ext2, and all the other junk. - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot possibly gain (or vice versa) - Relative merits

Re: GFS, what's remaining

2005-09-01 Thread Alan Cox
That's GFS. The submission is about a GFS2 that's on-disk incompatible to GFS. Just like say reiserfs3 and reiserfs4 or ext and ext2 or ext2 and ext3 then. I think the main point still stands - we have always taken multiple file systems on board and we have benefitted enormously from having

Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Alan Cox
On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote: [PATCH 1/2] unshare system call: System Call handler function sys_unshare Given the complexity of the kernel code involved and the obscurity of the functionality why not just do another clone() in userspace to unshare the things you want

Re: quota deadlock in 2.4.5-pre4

2001-05-23 Thread Alan Cox
I think it's a misfit between Linus' kernel and the quota tools from http://sourceforge.net/projects/linuxquota/ Linus quota code is way out of date and only handles 16bit uid Linus' tree and Alan's are showing a 2000 line diff in dquot.c alone. `quotaon' seems to be passing arguments into

Re: ECN is on!

2001-05-22 Thread Alan Cox
Matti Aarnio writes: I am contemplating to periodically turn off the ECN bit to let email out, but DaveM has veto there. I veto, the whole point of moving to ECN was to make a statement and get people to fix their kit. We will remove these people, that's all. Since HTML email also

Re: Why side-effects on open(2) are evil. (was Re: [RFD

2001-05-20 Thread Alan Cox
Why are LVM and EVMS(competing LVM project) needed at all? I prefer to think of it the other way around Surely the same can be accomplished with * md * snapshot blkdev (attached in previous e-mail) * giving partitions and blkdevs the ability to grow and shrink * giving filesystems the

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-20 Thread Alan Cox
How about sprintf(s + strlen(s), foo)? Solar Designer said two years ago we should be using snprintf in the kernel. He was most decidedly right 8) - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED]

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-20 Thread Alan Cox
Linus, as much as I'd like to agree with you, you are hopeless optimist. 90% of drivers contain code written by stupid gits. I think thats a very arrogant and very mistaken view of the problem. 90% of the driver are written by people who are - Copying from other drivers

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-19 Thread Alan Cox
On Sun, 20 May 2001, Ingo Oeser wrote: PS: English is neither mine, nor Linus native language. Why do the English natives complain instead of us? ;-) Because we had some experience with, erm, localized systems and for Alan it's most likely pure theory? ;-) I think its important its

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-19 Thread Alan Cox
ioctls are evil, period. At least with these names you can use normal scripting and don't need any special tools. Every ioctl means a binary that has no business to exist. That is not IMHO a rational argument. It isn't my fault that your shell does not support ioctls usefully. If you used

Re: cramfs b0rken on HIGHMEM machines

2001-03-22 Thread Alan Cox
just look at fs/cramfs/inode.c:cramfs_read_page() It uses page_address instead of kmap(). I would have fixed it myself, but I don't know, how I should kunmap() it, once we have memory pressure. Take a look at ramfs. kmap isnt really a 'pressure' thing. You want to kunmap the page as soon