Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
- Forwarded message from Jim Klimov jimkli...@cos.ru - From: Jim Klimov jimkli...@cos.ru Date: Thu, 04 Oct 2012 13:44:21 +0400 To: z...@lists.illumos.org CC: Eugen Leitl eu...@leitl.org Subject: Re: ZFS dedup? hashes (Re: [cryptography] [zfs] SHA-3 winner announced) Reply-To: jimkli...@cos.ru Organization: JSC COS/HT User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 2012-10-03 18:52, Eugen Leitl wrote: I infer from your comments that you are focusing on the ZFS use of a hash for dedup? (The forward did not include the full context). A forged collision for dedup can translate into a DoS (deletion) so 2nd pre-image collision resistance would still be important. This subject was discussed a few months ago on zfs-discuss, I believe the thread history may be here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-July/051865.html Regarding the dedup-collision attacks, the problem is such: zfs dedup uses a checksum of a low-level block of ZFS data (which has passed compression, and encryption in case of Solaris 11). The final on-disk blocks with whatever contents are checksummed as part of ZFS integrity verification (lack of bitrot), and the stronger of these checksums can be used as keys into the deduplication table (DDT) if enabled for these datasets and used upon write (i.e. prepare the final block contents, make checksum, find it in DDT, increment the DDT entry counter or make a new DDT entry with counter=1). The DDT is shared by many datasets on the pool, and accounting of used/free space becomes interesting, but the users have little if any ways to know whether their data was deduped (might infer that from changes of used/free space, but one can never be sure if HIS recently written file was involved). The block is several sectors in size, now ranging from 512b to 128Kb. In order to craft an attack on dedup you should: 1) Know what data will be written by the victim - exactly, including raw data, compression algo, encryption, etc.; 2) Create a block with forged data which has the same checksum (as used by this block's metadata on disk - currently sha256, maybe more as a result of Saso's work); 3) Be the very first writer into this pool that creates a block with this hash and enters it into the DDT. Reality is that any co-user of space on the deduped pool might do this. However, impracticality is that you need such intimate access to the victim's source data and system setup details, that you might just as well be the storage admin who can just corrupt and overwrite the victim's userdata block with whatever trash. Also, as far as dedup goes, a simple setting of verify=on requires comparison of on-disk block with the one ZFS is going to save (given they have same checksums, maybe size, and one is already in DDT), and if these two don't match ZFS should just write the new block non-deduped. The attack would at most waste space on the storage if the victim's data is indeed dedupable and ultimately many identical copies are saved, but the forged block only sits and occupies the DDT entry. Incidentally a somewhat related problem with dedup (probably more in cloud storage than local dedup of storage) is that the dedup function itself can lead to the confirmation or even decryption of documents with sufficiently low entropy as the attacker can induce you to store or directly query the dedup service looking for all possible documents. eg say a form letter where the only blanks to fill in are the name (known suspected) and a figure (1,000,000 possible values). What sort of attack do you suggest? That a storage user (attacker) pre-creates a million files of this form with filled-in data? Having no access to ZFS low-level internals and metadata, the end-user has no reliable way of knowing that a particular file got deduped. (And it's not files, but component blocks, to be exact). And if an admin does that, he might just as well read the victim's file directly (on non-encrypted pool). Or did I misunderstand your point? Also if there is encryption there are privacy and security leaks arising from doing dedup based on plaintext. And if you are doing dedup on ciphertext (or the data is not encrypted), you could follow David's suggestion of HMAC-SHA1 or the various AES-MACs. In fact I would suggest for encrypted data, you really NEED to base dedup on MACs and NOT hashes or you leak and risk bruteforce decryption of plaintext by hash brute-forcing the non-encrypted dedup tokens. I am not a cypher expert to even well decipher this part ;) HTH, //Jim Klimov - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ cryptography mailing list
Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
- Forwarded message from Sašo Kiselkov skiselkov...@gmail.com - From: Sašo Kiselkov skiselkov...@gmail.com Date: Thu, 04 Oct 2012 15:19:59 +0200 To: z...@lists.illumos.org CC: Eugen Leitl eu...@leitl.org Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced) User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 Reply-To: z...@lists.illumos.org On 10/04/2012 02:41 PM, Eugen Leitl wrote: - Forwarded message from David McGrew (mcgrew) mcg...@cisco.com - From: David McGrew (mcgrew) mcg...@cisco.com Date: Thu, 4 Oct 2012 12:19:55 + To: Eugen Leitl eu...@leitl.org, cryptography@randombit.net cryptography@randombit.net Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced) user-agent: Microsoft-MacOutlook/14.2.1.120420 It would be redundant to use HMAC-SHA256 in conjunction with authenticated encryption modes like those mentioned on the Oracle webpage that I mentioned (AES-GCM and AES-CCM).Perhaps what you meant to say is that when those modes are used, that SHA256 is used as the ZFS data-integrity checksum? Or is it the case that the data-integrity checksum can use a keyed message authentication code? If we get around to implementing encryption in Illumos, we would most likely go the same route. Thanks for your insights, though, they are certainly valuable. Is there any public specification for how cryptography is used in either the Sun/Oracle version or the Illumos version of ZFS? I'm not really sure how Oracle implemented their stuff in detail. I know that they use the block-level checksum to also authenticate the data, but then they also say that you can perform a block validation even if you don't have the encryption key. Best talk to Oracle about the details on that. Illumos' ZFS doesn't have encryption, so block authentication isn't important for us. Cheers, -- Saso --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/22842876-6fe17e6f Modify Your Subscription: https://www.listbox.com/member/?member_id=22842876id_secret=22842876-a25d3366 Powered by Listbox: http://www.listbox.com - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ cryptography mailing list cryptography@randombit.net http://lists.randombit.net/mailman/listinfo/cryptography
Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
- Forwarded message from Sašo Kiselkov skiselkov...@gmail.com - From: Sašo Kiselkov skiselkov...@gmail.com Date: Thu, 04 Oct 2012 15:39:18 +0200 To: z...@lists.illumos.org CC: Eugen Leitl eu...@leitl.org Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced) User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 On 10/04/2012 02:41 PM, Eugen Leitl wrote: - Forwarded message from Adam Back a...@cypherspace.org - From: Adam Back a...@cypherspace.org Date: Thu, 4 Oct 2012 13:39:35 +0100 To: Eugen Leitl eu...@leitl.org Cc: cryptography@randombit.net, Jim Klimov jimkli...@cos.ru, Adam Back a...@cypherspace.org Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced) User-Agent: Mutt/1.5.21 (2010-09-15) On Thu, Oct 04, 2012 at 11:47:08AM +0200, Jim Klimov wrote: [decrypting or confirming encrypted or ACLed documents via dedup] eg say a form letter where the only blanks to fill in are the name (known suspected) and a figure (1,000,000 possible values). What sort of attack do you suggest? That a storage user (attacker) pre-creates a million files of this form with filled-in data? The otherway around - let the victim store their confidential but low entropy file. Then the attacker writes all permutations, and does timing or disk free stats or other side channel to tell which was the correct guess. Since block dedup happens at transaction group (txg) commit intervals (i.e. the blocks aren't dedup'ed in memory, but only at txg commit to stable storage), in order to get reliable results (from observing storage behavior) you'd need to probe an entirely unloaded system extremely slowly (a few blocks per txg interval at best). Needless to say this is extremely optimistic and is still highly impractical. Any kind of other chatter on system (other processes doing something) will crush any hopes of this kind of attack yielding any useful data. Moreover, dedup is typically used in large storage systems (NAS/SAN) where one rarely gets local access (most users access the system via some file-level sharing protocol, e.g. NFS, or block-level, such as iSCSI or FC), which cover the inner workings of the storage system with a thick and heavy protocol blanket. Given that one can get a private key out of an RSA private key holding server by being another unprivileged process, based on cache lines, timing etc it seems to me likely you would be able to tell dedup. And maybe you can dedup lots of times, eg create, delete, wait for space reclaim, write again (to get better accuracy stats from having lots of timing samples.) As mentioned above, you'll be probably limited by txg commit intervals, making this attack highly impractical. Cheers, -- Saso - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ cryptography mailing list cryptography@randombit.net http://lists.randombit.net/mailman/listinfo/cryptography
Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
- Forwarded message from Jim Klimov jimkli...@cos.ru - From: Jim Klimov jimkli...@cos.ru Date: Thu, 04 Oct 2012 19:12:16 +0400 To: z...@lists.illumos.org CC: Pawel Jakub Dawidek p...@freebsd.org, Eugen Leitl eu...@leitl.org Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced) Reply-To: jimkli...@cos.ru Organization: JSC COS/HT User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 2012-10-04 18:00, Pawel Jakub Dawidek wrote: Invalidating one side channel doesn't mean there aren't more. It is safer to assume there are more. True. One security project I was affiliated with began with an axiom: a networked system is considered broken into, and the project was about providing safe communications - necessarily without traditional networking gear/interfaces - between internal data-processing subnets and those required to face the evil internet and thus tainted and corrupted ;) Another one that comes to my mind is to wait until the load is small and observe with df(1) if the used space grows when we write and by how much. You can even do binary search by writing many possible blocks and observing if the space grew as much as it should. If not, maybe we have a hit and we can split our blocks in half and retry, etc. This would work over NFS just fine. IMHO your dataset's (NFS share's) used space should grow regardless of dedup in action. However, if your viewpoint includes the parent pool, something might be inferred indeed. But as Saso said, you need a quiet pool doing nothing else but your cracking task for the duration of TXG interval, which is unlikely already - more so on shared cloud storage. //Jim - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ cryptography mailing list cryptography@randombit.net http://lists.randombit.net/mailman/listinfo/cryptography
[cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
I infer from your comments that you are focusing on the ZFS use of a hash for dedup? (The forward did not include the full context). A forged collision for dedup can translate into a DoS (deletion) so 2nd pre-image collision resistance would still be important. However 2nd pre-image collision resistance is typically offered at higher assurance than chosen pairs of collisions (because you can use birthday effect to roughly square root the search space with pairs). So to that extent I agree your security reliance on hash properties is weaker than for integrity protection. And SHA1 is still secure against 2nd pre-image whereas its collision resistance has been demonstrated being below design strength. Incidentally a somewhat related problem with dedup (probably more in cloud storage than local dedup of storage) is that the dedup function itself can lead to the confirmation or even decryption of documents with sufficiently low entropy as the attacker can induce you to store or directly query the dedup service looking for all possible documents. eg say a form letter where the only blanks to fill in are the name (known suspected) and a figure (1,000,000 possible values). Also if there is encryption there are privacy and security leaks arising from doing dedup based on plaintext. And if you are doing dedup on ciphertext (or the data is not encrypted), you could follow David's suggestion of HMAC-SHA1 or the various AES-MACs. In fact I would suggest for encrypted data, you really NEED to base dedup on MACs and NOT hashes or you leak and risk bruteforce decryption of plaintext by hash brute-forcing the non-encrypted dedup tokens. Adam On Wed, Oct 03, 2012 at 03:41:27PM +0200, Eugen Leitl wrote: - Forwarded message from Sašo Kiselkov skiselkov...@gmail.com - From: Sašo Kiselkov skiselkov...@gmail.com Date: Wed, 03 Oct 2012 15:39:39 +0200 To: z...@lists.illumos.org CC: Eugen Leitl eu...@leitl.org Subject: Re: [cryptography] [zfs] SHA-3 winner announced User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 Well, it's somewhat difficult to respond to cross-posted e-mails, but here goes: On 10/03/2012 03:15 PM, Eugen Leitl wrote: - Forwarded message from Adam Back a...@cypherspace.org - From: Adam Back a...@cypherspace.org Date: Wed, 3 Oct 2012 13:25:06 +0100 To: Eugen Leitl eu...@leitl.org Cc: cryptography@randombit.net, Adam Back a...@cypherspace.org Subject: Re: [cryptography] [zfs] SHA-3 winner announced User-Agent: Mutt/1.5.21 (2010-09-15) (comment to Saso's email forwarded by Eugen): Well I think it would be fairer to say SHA-3 was initiatied more in the direction of improving on the state of art in security of hash algorithms [snip] In that you see the selection of Keecak, focusing more on its high security margin, and new defenses against existing known types of attacks. At no point did I claim that the NIST people chose badly. I always said that NIST's requirements need not align perfectly with ZFS' requirements. If the price of that is slower, so be it - while fast primitives are very useful, having things like MD5 full break and SHA-1 significant weakening take the security protocols industry by surprise is also highly undesirable and expensive to fix. To some extent for the short/mid term almost unfixable given the state of software update, and firmware update realities. Except in ZFS, where it's a simple zfs set command. Remember, Illumos' ZFS doesn't use the hash as a security feature at all - that property is not the prime focus. So while I am someone who pays attention to protocol, algorithm and implementation efficiency, I am happy with Keecak. ZFS is not a security protocol, therefore the security margin of the hash is next to irrelevant. Now that is not to say that it's entirely pointless - it's good to have some security there, just for the added peace of mind, but it's crazy to focus on it as primary concern. And CPUs are geting faster all the time, the Q3 2013 ivybridge (22nm) intel i7 next year is going to be available in 12-core (24 hyperthreads) with 30GB cache. Just chuck another core at if if you have problems. ARMs are also coming out in more cores. Aaah, the good old but CPUs are getting faster every day! argument. So should people hold off for a few years before purchasing new equipment for problems they have now? And if these new super-duper CPUs are so much higher performing, why not use a more efficient algo and push even higher numbers with them? If I could halve my costs by simply switching to a faster algorithm, I'd do it in a heartbeat! And AMD 7970 GPU has 2048 cores. Are you suggesting we run ZFS kernel code on GPUs? How about driver issues? Or simultaneous use by graphical apps/games? Who's going to implement and maintain this? It's easy to propose theoretical models, but unless you plan to invest the energy in this, it'll most likely remain purely theoretical. For embedded and portable use,
Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
On Wed, Oct 3, 2012 at 9:19 AM, Dr Adam Back a...@cypherspace.org wrote: Incidentally a somewhat related problem with dedup (probably more in cloud storage than local dedup of storage) is that the dedup function itself can lead to the confirmation or even decryption of documents with sufficiently low entropy as the attacker can induce you to store or directly query the dedup service looking for all possible documents. eg say a form letter where the only blanks to fill in are the name (known suspected) and a figure (1,000,000 possible values). Also if there is encryption there are privacy and security leaks arising from doing dedup based on plaintext. Compression at lower layers tends to leak. We've seen this in VOIP, and now CRIME. Dedup is a compression function running at a lower layer (i.e., lower than the application writing the file contents). Of course, dedup is not a compression function that is easily applied at the application layer, so if you really need dedup, then you need it at lower layers. The question is: do you need dedup and confidentiality protection for the same data? I think most would answer no. And if you are doing dedup on ciphertext (or the data is not encrypted), you could follow David's suggestion of HMAC-SHA1 or the various AES-MACs. In fact I would suggest for encrypted data, you really NEED to base dedup on MACs and NOT hashes or you leak and risk bruteforce decryption of plaintext by hash brute-forcing the non-encrypted dedup tokens. Encrypted ZFS hashes and authenticates ciphertext. The attacker is presumed to observe all on-disk data, including ciphertext, block pointers (which contain authentication tags and hashes), ... The attacker can observe dups as well as ZFS, and can attempt passive and active attacks. Dedup certainly adds to the attacker's traffic analysis capabilities, but also to the attacker's active attack capabilities (e.g., if the attacker can mount a chosen plaintext attack). Note that encrypted ZFS can only dedup within sets of datasets that share the same keys. What difference does it make if dedup uses an authentication tag or a hash of ciphertext? Assuming no collisions anyways, and if dups are verified then collisions make little difference as far as dedup is concerned. I think the harm is done first by compressing and encrypting at a layer lower than the application; encryption can be done at lower layers, but compression is best left to the application layer. Nico -- ___ cryptography mailing list cryptography@randombit.net http://lists.randombit.net/mailman/listinfo/cryptography