On Mon, Apr 08, 2013 at 09:48:16PM -0400, Josef Bacik wrote: > On Mon, Apr 8, 2013 at 9:34 PM, Liu Bo <bo.li....@oracle.com> wrote: > > On Mon, Apr 08, 2013 at 04:37:20PM -0400, Josef Bacik wrote: > >> On Mon, Apr 08, 2013 at 08:16:26AM -0600, Liu Bo wrote: > >> > On Mon, Apr 08, 2013 at 08:54:50AM -0400, Josef Bacik wrote: > >> > > On Sun, Apr 07, 2013 at 07:12:48AM -0600, Liu Bo wrote: > > [...] > >> > > + __le64 dedup_hash; > >> > > > + > >> > > > } __attribute__ ((__packed__)); > >> > > > > >> > > > >> > > Please don't do this, do like what we do with the crc tree, just > >> > > lookup based on > >> > > the disk bytenr when we free the extent and drop refs that way. Don't > >> > > further > >> > > bloat the file extent item, we want it smaller not larger. Thanks, > >> > > > >> > > Josef > >> > > >> > So the real trouble is that I take this hash value as the first field of > >> > btrfs_key, and binary searching without the precise first field is not > >> > easy. > >> > > >> > Otherwise we may have to add another key type which replaces hash value > >> > with > >> > disk bytenr, ie. (disk bytenr, ANOTHER_KEY_TYPE, hash value), then we'll > >> > need to > >> > search the tree twice to free this one or drop refs. > >> > > >> > Either case is tradeoff, but as this is an initial version, we can try > >> > all of > >> > these knobs and choose the better one :) > >> > > >> > >> Why would you need to search twice? You do something like > >> > >> key.objectid = bytenr; > >> key.type = whatever your type is > >> key.offset = first 64bits of your hash > >> > >> and then you search based on bytenr and you get your hash. All extents > >> that > >> share hashes will have the same bytenr so you can just search with > >> > >> key.objectid = bytenr > >> key.type = whatever > >> key.offset = (u64)-1 > >> > >> and then walk backwards, or do 0 and walk forwards, whatever your > >> preference is. > >> Also make it so the item stores the entire sha. With Sha256 you are > >> wasting > >> most of the hash result by just storing the first 64bits. So just use the > >> first > >> 64bits as the index and then store the full hash in the item so you can do > >> a > >> proper compare, and then you can have different hash functions later with > >> different length hash values. Thanks, > > > > This is on freeing extent, what about finding dedup when we don't yet have a > > disk bytenr but only a hash value from sha256 on file data? > > > > That's why (hash, type, disk bytenr) is necessary. > > > > Searching twice means that if we do what you've suggested, we'd not only > > update > > or free (disk bytenr, type, hash), but also (hash, type, disk bytenr). > > > > Or am I missing something? > > > > No sorry I was putting my son to sleep thinking about this and I > realized the problem you were having. So I say do like we do with > DIR_ITEMs, you have one type indexed by bytenr and one indexed by the > stop 64bits of the hash value. You just have an empty item for the > index by bytenr and for the index by the hash value you have the > offset of the key be the bytenr and then the item stores the length of > the full hash value and the hash value. You don't need ref counting > since that's taken care of by the extent tree, you just free the > corresponding dedup items when you free the extent. It would be nice > to do everything with just one key but I don't think you are going to > be able to do it, unless you can figure out some way to stash the hash > in the extent ref or something. But growing the file extent item or > the extent ref isn't an option since it affects anybody who doesn't > use dedup, so I think the 2 key option is probably the best bet. > Thanks, > > Josef
Yeah, this makes sense. All right, I've finished shipping it to two key style(from Josef) and getting rid of the extra ref(from Josef and Miao). But I still need to build a framework to enlarge the dedup blocksize rather than a page size(from Dave). thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html