2012/2/23 Jeff Layton <[email protected]>:
> On Thu, 23 Feb 2012 10:56:40 +0400
> Pavel Shilovsky <[email protected]> wrote:
>
>> 2012/2/22 Jeff Layton <[email protected]>:
>> >
>> > Hmm...looking at my machine:
>> >
>> > (gdb) p sizeof(struct cifsLockInfo)
>> > $1 = 144
>> >
>> > So that's 144 bytes per struct...what's the upper limit on "count"?
>> >
>> > It seems like that could potentially be a large allocation, and we have
>> > to do this in the oplock break codepath.
>> >
>> > Instead of doing this, would it be better to allocate these when
>> > the VFS first calls ->lock, and attach them to a new member of
>> > file_lock.fl_u?
>>
>> Every time we can have a different number of locks - so, can't
>> pre-allocate it.
>
> You could preallocate these:
>
> 1/ Do the allocation in cifs_setlk
> 2/ Attach it to a new fl_u member in the file_lock struct
I am sure I got the idea. Do you mean to allocate the memory in inode->i_flock?
> 3/ Add a fl_release_private op to free that object
>
> Of course, you will need to consider how to deal with lock splitting
> and merging if you do that...
Yes, how can we deal with this sort of things? We can calculate the
lock number after every cifs_setlk but it's rather time consumption.
If we use this struct:
struct lock_to_push {
__u64 offset;
__u64 length;
__u32 pid;
__u16 netfid;
__u8 type;
};
it's sizeof is 24 bytes. If we have e.g 10^5 locks we end up with
2400000 bytes = 2.4 megabyte - don't think allocating this plays any
big role in whole time we need to push 10^5 locks to the server.
What's the problem with allocating it here?
>
>> I can create new struct for this only - without extra
>> llist, blist, block_q fields, that can save enough memory.
>>
>
> That sounds preferable. We really don't want to allocate more memory
> than is needed. Plus, anytime you duplicate information at least one
> copy is going to be wrong at some point in time.
>
>> >
>> >> lock_flocks();
>> >> cifs_for_each_lock(cfile->dentry->d_inode, before) {
>> >> @@ -950,35 +962,38 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
>> >> type = CIFS_RDLCK;
>> >> else
>> >> type = CIFS_WRLCK;
>> >> -
>> >> - lck = cifs_lock_init(flock->fl_start, length, type,
>> >> - cfile->netfid);
>> >> - if (!lck) {
>> >> - rc = -ENOMEM;
>> >> - goto send_locks;
>> >> - }
>> >> + lck = &lcks[i];
>> >> lck->pid = flock->fl_pid;
>> >> -
>> >> - list_add_tail(&lck->llist, &locks_to_send);
>> >> + lck->netfid = cfile->netfid;
>> >> + lck->length = length;
>> >> + lck->type = type;
>> >> + lck->offset = flock->fl_start;
>> >> + i++;
>> >> + if (i == count)
>> >> + break;
>> >> }
>> >> -
>> >> -send_locks:
>> >> unlock_flocks();
>> >>
>> >> - list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
>> >> + if (i == count)
>> >> + cERROR(1, "Can't push all brlocks!");
>> >> +
>> >
>> > This sort of thing always makes me a little worried. Sure, you're
>> > allocating a bunch of extra locks, but what's the potential effect
>> > of not being able to push all the locks here? The user is going to
>> > see this message and have no clue of what it means.
>> >
>> > Even I'm not sure what this means in practice. The VFS will have a
>> > record of this lock, but the server won't...
>> >
>>
>> I really think that there will be no such situations, because we hold
>> inode->lock_mutex - the locking state should leave the same.
>>
>
> I don't share that optimism.
>
> The lock_mutex protects the cinode->llist, but here you're walking the
> inode->i_flock list. It's certainly possible that it could change in
> between unlock_flocks and lock_flocks.
>
The lock_mutex now protects not only cinode->llist, but locks from the
inode structure reached from cifs code as well. The only thing I
missed is posix_lock_file_wait in the end of the cifs_setlk function -
it should be surrounded with lock_mutex as well. According to this,
there is no other way to increase POSIX lock number from the inode
except locking/unlocking lock_mutex.
--
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html