On Tuesday, 14 December, 2010, Li Zefan wrote:
> Goffredo Baroncelli wrote:
> > Hi Li,
> > 
> > On Monday, 13 December, 2010, Li Zefan wrote:
> >> The keys returned by tree search ioctl should be restricted to:
> >>
> >>    key.objectid = [min_objectid, max_objectid] &&
> >>    key.offset   = [min_offset, max_offset] &&
> >>    key.type     = [min_type, max_type]
> >>
> >> But actually it returns those keys:
> >>
> >>    [(min_objectid, min_type, min_offset),
> >>            (max_objectid, max_type, max_offset)].
> >>
> > 
> > I have to admit that I had need several minutes to understand what you 
wrote 
> > :). Then I came to conclusion that the tree search ioctl is basically 
wrong.
> > 
> > IMHO, the error in this API is to use the lower bound of the acceptance 
> > criteria (the min_objectid, min_offset, min_type fields) also as starting 
> > point for the search.
> > 
> > Let me explain with an example.
> > 
> > Suppose to want to search all the keys in the range
> > 
> >     key.objectid = 10..20
> >     key.offset   = 100..200
> >     key.type     = 2..5
> > 
> > 
> > Suppose to set sk->nr_items to 1 for simplicity, and the keys available 
which 
> > fit in the range are
> > 
> > 1)  [15,150,3]
> > 2)  [16,160,4]
> > 3)  [17,180,3]
> > 
> > All these key satisfy the "acceptance criteria", but because we have to 
> > restart the search from the last key found, the code should resemble
> > 
> >     sk = &args.key
> > 
> >     sk->min_objectid=10; sk->max_objectid=20
> >     sk->min_offset=100; sk->max_offset=200
> >     sk->min_type=2; sk->max_type=5
> >     sk->nr_items = 1;
> > 
> >     while(1){
> >             ioctl(fd,  BTRFS_IOC_TREE_SEARCH, &args);
> >             if( !sk->nr_items )
> >                     break
> > 
> >             for(off = 0, i=0 ; i < sk->nr_items ; i   ){
> >                     sh = (struct btrfs_ioctl_search_header *)(args.buf  
> >                                                                  off);
> > 
> >                     [...]
> >                     sk->min_objectid = sh->objectid;
> >                     sk->min_offset = sh->offset;
> >                     sk->min_type = sh->type;
> >             }
> > 
> >             <increase the sk->min_* key of 1>
> >                             
> >     }
> > 
> > But in this case, the code after found the key #2, sets the minimum 
acceptance 
> > criteria to [16,160,4], which exclude the key #3 because min_type is too 
high.
> > 
> > Ideally, we should add three new field to the search key structure:
> > 
> >     sk->start_objectid
> >     sk->start_offset
> >     sk->start_type
> > 
> > And after every iteration the code (even the kernel code) should set these 
> > fields as "last key found  1", leaving the min_* fields as they are.
> > 
> > My analysis is correct or I miss  something ?
> > 
> 
> After looking more deeply, I found the ioctl was changed in this way
> on purpose, to support "btrfs subvolume find-new" specifically.
> 
> See this commit:
> 
> commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33
> Author: Chris Mason <chris.ma...@oracle.com>
> Date:   Thu Mar 18 12:10:08 2010 -0400
> 
>     Btrfs: fix key checks and advance in the search ioctl
> 
>     The search ioctl was working well for finding tree roots, but using it 
for
>     generic searches requires a few changes to how the keys are advanced.
>     This treats the search control min fields for objectid, type and offset
>     more like a key, where we drop the offset to zero once we bump the type,
>     etc.
> 
>     The downside of this is that we are changing the min_type and min_offset
>     fields during the search, and so the ioctl caller needs extra checks to 
make
>     the keys in the result are the ones it wanted.
> 
>     This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
>     things more readable.
> 
> So I think we can just fix the btrfs tool. Though adding sk->start_xxx 
should
> also be able to meet the needs for "btrfs subvolume find-new".

Sorry, but I have to disagree. This API seems to me simply bugged. The example 
above (which is quite generic) highlights this fact. But I can provide a more 
real case: suppose to use the BTRFS_IOC_TREE_SEARCH ioctl to find the new 
files. We are interested to the following items:

- BTRFS_EXTENT_DATA_KEY                 (type = 1)
- BTRFS_INODE_ITEM_KEY                  (type = 24)
- BTRFS_XATTR_ITEM_KEY                  (type = 108)

Acceptance criteria:

        min_type = 1
        max_type = 108
        min_offset = 0
        max_offset = ~0
        min_objectid = 0
        max_objectid = ~0
        min_transid = <the base generation number>

Pay attention that we aren't interested in the offset.

Suppose to have the following sequence keys  [objectid, type, offset]:

[...]
1)      [300, BTRFS_EXTENT_DATA_KEY, xx]
2)      [300, BTRFS_INODE_ITEM_KEY, xx]
3)      [300, BTRFS_XATTR_ITEM_KEY, xx]
4)      [301, BTRFS_EXTENT_DATA_KEY, xx]
5)      [301, BTRFS_INODE_ITEM_KEY, xx]
7)      [30200, BTRFS_EXTENT_DATA_KEY, xx]
8)      [30200, BTRFS_INODE_ITEM_KEY, xx]
9)      [30200, BTRFS_XATTR_ITEM_KEY, xx]
[...]


Suppose that the buffer is filled between the item 2 and 3. We should restart 
the search, but how set the min_* key ? Try the following hypothesis

h1) objectid++, type = 0 -> In the next search the key 3 would be skipped
h2) objectid asis, type ++, -> in the next search the key 4 would be skipped
h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be 
returned a second time...

Pay attention that every inode may have more key type BTRFS_XATTR_ITEM_KEY or 
type BTRFS_EXTENT_DATA_KEY, so it is not possible to know in advance when the 
buffer is filled.

Only as  theoretical exercise, we can improve the search logic in userspace so 
when an item is returned, in the next search we set the minimum type as 
previous type+1, and the *maximum* objectid as the latest ofound bject id. 
When we are sure that there are not more key with this objectid  we can reuse 
the old max_objectid and min_type... But to me it seems very fragile.

Chris what do you think ? Otherwise I missed something this seems a severe bug 
in the api ?

In another email I will propose a patch which may address this problem.

Regards
G.Baroncelli

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreij...@inwind.it>
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to