----- Forwarded message from Jim Klimov <[email protected]> ----- From: Jim Klimov <[email protected]> Date: Thu, 04 Oct 2012 13:44:21 +0400 To: [email protected] CC: Eugen Leitl <[email protected]> Subject: Re: ZFS dedup? hashes (Re: [cryptography] [zfs] SHA-3 winner announced) Reply-To: [email protected] Organization: JSC COS/HT User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
2012-10-03 18:52, Eugen Leitl wrote: > I infer from your comments that you are focusing on the ZFS use of a hash > for dedup? (The forward did not include the full context). A forged > collision for dedup can translate into a DoS (deletion) so 2nd pre-image > collision resistance would still be important. This subject was discussed a few months ago on zfs-discuss, I believe the thread history may be here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-July/051865.html Regarding the dedup-collision attacks, the problem is such: zfs dedup uses a checksum of a low-level block of ZFS data (which has passed compression, and encryption in case of Solaris 11). The final on-disk blocks with whatever contents are checksummed as part of ZFS integrity verification (lack of bitrot), and the stronger of these checksums can be used as keys into the deduplication table (DDT) if enabled for these datasets and used upon write (i.e. prepare the final block contents, make checksum, find it in DDT, increment the DDT entry counter or make a new DDT entry with counter=1). The DDT is shared by many datasets on the pool, and accounting of used/free space becomes "interesting", but the users have little if any ways to know whether their data was deduped (might infer that from changes of used/free space, but one can never be sure if HIS recently written file was involved). The block is several sectors in size, now ranging from 512b to 128Kb. In order to craft an attack on dedup you should: 1) Know what data will be written by the victim - exactly, including raw data, compression algo, encryption, etc.; 2) Create a block with forged data which has the same checksum (as used by this block's metadata on disk - currently sha256, maybe more as a result of Saso's work); 3) Be the very first writer into this pool that creates a block with this hash and enters it into the DDT. Reality is that any co-user of space on the deduped pool might do this. However, impracticality is that you need such intimate access to the victim's source data and system setup details, that you might just as well be the storage admin who can just corrupt and overwrite the victim's userdata block with whatever trash. Also, as far as dedup goes, a simple setting of verify=on requires comparison of on-disk block with the one ZFS is going to save (given they have same checksums, maybe size, and one is already in DDT), and if these two don't match ZFS should just write the new block non-deduped. The attack would at most waste space on the storage if the victim's data is indeed dedupable and ultimately many identical copies are saved, but the forged block only sits and occupies the DDT entry. > Incidentally a somewhat related problem with dedup (probably more in cloud > storage than local dedup of storage) is that the dedup function itself can > lead to the "confirmation" or even "decryption" of documents with > sufficiently low entropy as the attacker can induce you to "store" or > directly query the dedup service looking for all possible documents. eg say > a form letter where the only blanks to fill in are the name (known > suspected) and a figure (<1,000,000 possible values). What sort of attack do you suggest? That a storage user (attacker) pre-creates a million files of this form with filled-in data? Having no access to ZFS low-level internals and metadata, the end-user has no reliable way of knowing that a particular file got deduped. (And it's not files, but component blocks, to be exact). And if an admin does that, he might just as well read the victim's file directly (on non-encrypted pool). Or did I misunderstand your point? > Also if there is encryption there are privacy and security leaks arising > from doing dedup based on plaintext. > > And if you are doing dedup on ciphertext (or the data is not encrypted), you > could follow David's suggestion of HMAC-SHA1 or the various AES-MACs. In > fact I would suggest for encrypted data, you really NEED to base dedup on > MACs and NOT hashes or you leak and risk bruteforce "decryption" of > plaintext by hash brute-forcing the non-encrypted dedup tokens. I am not a cypher expert to even well decipher this part ;) HTH, //Jim Klimov ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ cryptography mailing list [email protected] http://lists.randombit.net/mailman/listinfo/cryptography
