Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-04 Thread Eugen Leitl
- Forwarded message from Jim Klimov jimkli...@cos.ru -

From: Jim Klimov jimkli...@cos.ru
Date: Thu, 04 Oct 2012 13:44:21 +0400
To: z...@lists.illumos.org
CC: Eugen Leitl eu...@leitl.org
Subject: Re: ZFS dedup? hashes (Re: [cryptography] [zfs] SHA-3 winner announced)
Reply-To: jimkli...@cos.ru
Organization: JSC COS/HT
User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:15.0) Gecko/20120907 
Thunderbird/15.0.1

2012-10-03 18:52, Eugen Leitl wrote:
 I infer from your comments that you are focusing on the ZFS use of a hash
 for dedup?  (The forward did not include the full context).  A forged
 collision for dedup can translate into a DoS (deletion) so 2nd pre-image
 collision resistance would still be important.

This subject was discussed a few months ago on zfs-discuss,
I believe the thread history may be here:

http://mail.opensolaris.org/pipermail/zfs-discuss/2012-July/051865.html

Regarding the dedup-collision attacks, the problem is such: zfs
dedup uses a checksum of a low-level block of ZFS data (which
has passed compression, and encryption in case of Solaris 11).
The final on-disk blocks with whatever contents are checksummed
as part of ZFS integrity verification (lack of bitrot), and
the stronger of these checksums can be used as keys into the
deduplication table (DDT) if enabled for these datasets and
used upon write (i.e. prepare the final block contents, make
checksum, find it in DDT, increment the DDT entry counter or
make a new DDT entry with counter=1). The DDT is shared by
many datasets on the pool, and accounting of used/free space
becomes interesting, but the users have little if any ways
to know whether their data was deduped (might infer that from
changes of used/free space, but one can never be sure if HIS
recently written file was involved).

The block is several sectors in size, now ranging from 512b
to 128Kb. In order to craft an attack on dedup you should:
1) Know what data will be written by the victim - exactly,
   including raw data, compression algo, encryption, etc.;
2) Create a block with forged data which has the same checksum
   (as used by this block's metadata on disk - currently sha256,
   maybe more as  a result of Saso's work);
3) Be the very first writer into this pool that creates a block
   with this hash and enters it into the DDT.

Reality is that any co-user of space on the deduped pool might
do this. However, impracticality is that you need such intimate
access to the victim's source data and system setup details,
that you might just as well be the storage admin who can just
corrupt and overwrite the victim's userdata block with whatever
trash. Also, as far as dedup goes, a simple setting of verify=on
requires comparison of on-disk block with the one ZFS is going
to save (given they have same checksums, maybe size, and one is
already in DDT), and if these two don't match ZFS should just
write the new block non-deduped. The attack would at most waste
space on the storage if the victim's data is indeed dedupable
and ultimately many identical copies are saved, but the forged
block only sits and occupies the DDT entry.

 Incidentally a somewhat related problem with dedup (probably more in cloud
 storage than local dedup of storage) is that the dedup function itself can
 lead to the confirmation or even decryption of documents with
 sufficiently low entropy as the attacker can induce you to store or
 directly query the dedup service looking for all possible documents.  eg say
 a form letter where the only blanks to fill in are the name (known
 suspected) and a figure (1,000,000 possible values).

What sort of attack do you suggest? That a storage user (attacker)
pre-creates a million files of this form with filled-in data?

Having no access to ZFS low-level internals and metadata, the
end-user has no reliable way of knowing that a particular file
got deduped. (And it's not files, but component blocks, to be
exact). And if an admin does that, he might just as well read
the victim's file directly (on non-encrypted pool).

Or did I misunderstand your point?

 Also if there is encryption there are privacy and security leaks arising
 from doing dedup based on plaintext.

 And if you are doing dedup on ciphertext (or the data is not encrypted), you
 could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
 fact I would suggest for encrypted data, you really NEED to base dedup on
 MACs and NOT hashes or you leak and risk bruteforce decryption of
 plaintext by hash brute-forcing the non-encrypted dedup tokens.

I am not a cypher expert to even well decipher this part ;)

HTH,
//Jim Klimov

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
cryptography mailing list

Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-04 Thread Eugen Leitl
- Forwarded message from Sašo Kiselkov skiselkov...@gmail.com -

From: Sašo Kiselkov skiselkov...@gmail.com
Date: Thu, 04 Oct 2012 15:19:59 +0200
To: z...@lists.illumos.org
CC: Eugen Leitl eu...@leitl.org
Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner
announced)
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929
Thunderbird/7.0.1
Reply-To: z...@lists.illumos.org

On 10/04/2012 02:41 PM, Eugen Leitl wrote:
 - Forwarded message from David McGrew (mcgrew) mcg...@cisco.com -
 
 From: David McGrew (mcgrew) mcg...@cisco.com
 Date: Thu, 4 Oct 2012 12:19:55 +
 To: Eugen Leitl eu...@leitl.org,
   cryptography@randombit.net cryptography@randombit.net
 Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner
   announced)
 user-agent: Microsoft-MacOutlook/14.2.1.120420
 
 It would be redundant to use HMAC-SHA256 in conjunction with authenticated
 encryption modes like those mentioned on the Oracle webpage that I
 mentioned (AES-GCM and AES-CCM).Perhaps what you meant to say is that
 when those modes are used, that SHA256 is used as the ZFS data-integrity
 checksum?   Or is it the case that the data-integrity checksum can use a
 keyed message authentication code?
 
 If we get around to implementing
 encryption in Illumos, we would most likely go the same route. Thanks
 for your insights, though, they are certainly valuable.
 
 Is there any public specification for how cryptography is used in either
 the Sun/Oracle version or the Illumos version of ZFS?

I'm not really sure how Oracle implemented their stuff in detail. I know
that they use the block-level checksum to also authenticate the data,
but then they also say that you can perform a block validation even if
you don't have the encryption key. Best talk to Oracle about the details
on that.

Illumos' ZFS doesn't have encryption, so block authentication isn't
important for us.

Cheers,
--
Saso


---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22842876-6fe17e6f
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=22842876id_secret=22842876-a25d3366
Powered by Listbox: http://www.listbox.com

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
cryptography mailing list
cryptography@randombit.net
http://lists.randombit.net/mailman/listinfo/cryptography


Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-04 Thread Eugen Leitl
- Forwarded message from Sašo Kiselkov skiselkov...@gmail.com -

From: Sašo Kiselkov skiselkov...@gmail.com
Date: Thu, 04 Oct 2012 15:39:18 +0200
To: z...@lists.illumos.org
CC: Eugen Leitl eu...@leitl.org
Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 
Thunderbird/7.0.1

On 10/04/2012 02:41 PM, Eugen Leitl wrote:
 - Forwarded message from Adam Back a...@cypherspace.org -
 
 From: Adam Back a...@cypherspace.org
 Date: Thu, 4 Oct 2012 13:39:35 +0100
 To: Eugen Leitl eu...@leitl.org
 Cc: cryptography@randombit.net, Jim Klimov jimkli...@cos.ru,
   Adam Back a...@cypherspace.org
 Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner
   announced)
 User-Agent: Mutt/1.5.21 (2010-09-15)
 
 On Thu, Oct 04, 2012 at 11:47:08AM +0200, Jim Klimov wrote:
 [decrypting or confirming encrypted or ACLed documents via dedup]
 eg say a form letter where the only blanks to fill in are the name (known
 suspected) and a figure (1,000,000 possible values).

 What sort of attack do you suggest? That a storage user (attacker)
 pre-creates a million files of this form with filled-in data?
 
 The otherway around - let the victim store their confidential but low
 entropy file.  Then the attacker writes all permutations, and does timing or
 disk free stats or other side channel to tell which was the correct guess.

Since block dedup happens at transaction group (txg) commit intervals
(i.e. the blocks aren't dedup'ed in memory, but only at txg commit to
stable storage), in order to get reliable results (from observing
storage behavior) you'd need to probe an entirely unloaded system
extremely slowly (a few blocks per txg interval at best). Needless to
say this is extremely optimistic and is still highly impractical. Any
kind of other chatter on system (other processes doing something) will
crush any hopes of this kind of attack yielding any useful data.
Moreover, dedup is typically used in large storage systems (NAS/SAN)
where one rarely gets local access (most users access the system via
some file-level sharing protocol, e.g. NFS, or block-level, such as
iSCSI or FC), which cover the inner workings of the storage system with
a thick and heavy protocol blanket.

 Given that one can get a private key out of an RSA private key holding
 server by being another unprivileged process, based on cache lines, timing
 etc it seems to me likely you would be able to tell dedup.  And maybe you
 can dedup lots of times, eg create, delete, wait for space reclaim, write
 again (to get better accuracy stats from having lots of timing samples.)

As mentioned above, you'll be probably limited by txg commit intervals,
making this attack highly impractical.

Cheers,
--
Saso

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
cryptography mailing list
cryptography@randombit.net
http://lists.randombit.net/mailman/listinfo/cryptography


Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-04 Thread Eugen Leitl
- Forwarded message from Jim Klimov jimkli...@cos.ru -

From: Jim Klimov jimkli...@cos.ru
Date: Thu, 04 Oct 2012 19:12:16 +0400
To: z...@lists.illumos.org
CC: Pawel Jakub Dawidek p...@freebsd.org, Eugen Leitl eu...@leitl.org
Subject: Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)
Reply-To: jimkli...@cos.ru
Organization: JSC COS/HT
User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:15.0) Gecko/20120907 
Thunderbird/15.0.1

2012-10-04 18:00, Pawel Jakub Dawidek wrote:
 Invalidating one side channel doesn't mean there aren't more. It is
 safer to assume there are more.

True. One security project I was affiliated with began with
an axiom: a networked system is considered broken into,
and the project was about providing safe communications -
necessarily without traditional networking gear/interfaces -
between internal data-processing subnets and those required
to face the evil internet and thus tainted and corrupted ;)

 Another one that comes to my mind is to
 wait until the load is small and observe with df(1) if the used space
 grows when we write and by how much. You can even do binary search by
 writing many possible blocks and observing if the space grew as much as
 it should. If not, maybe we have a hit and we can split our blocks in
 half and retry, etc. This would work over NFS just fine.

IMHO your dataset's (NFS share's) used space should grow
regardless of dedup in action. However, if your viewpoint
includes the parent pool, something might be inferred indeed.
But as Saso said, you need a quiet pool doing nothing else
but your cracking task for the duration of TXG interval,
which is unlikely already - more so on shared cloud storage.

//Jim

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
cryptography mailing list
cryptography@randombit.net
http://lists.randombit.net/mailman/listinfo/cryptography


[cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-03 Thread Dr Adam Back

I infer from your comments that you are focusing on the ZFS use of a hash
for dedup?  (The forward did not include the full context).  A forged
collision for dedup can translate into a DoS (deletion) so 2nd pre-image
collision resistance would still be important.

However 2nd pre-image collision resistance is typically offered at higher
assurance than chosen pairs of collisions (because you can use birthday
effect to roughly square root the search space with pairs).  So to that
extent I agree your security reliance on hash properties is weaker than for
integrity protection.  And SHA1 is still secure against 2nd pre-image
whereas its collision resistance has been demonstrated being below design
strength.

Incidentally a somewhat related problem with dedup (probably more in cloud
storage than local dedup of storage) is that the dedup function itself can
lead to the confirmation or even decryption of documents with
sufficiently low entropy as the attacker can induce you to store or
directly query the dedup service looking for all possible documents.  eg say
a form letter where the only blanks to fill in are the name (known
suspected) and a figure (1,000,000 possible values).

Also if there is encryption there are privacy and security leaks arising
from doing dedup based on plaintext.

And if you are doing dedup on ciphertext (or the data is not encrypted), you
could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
fact I would suggest for encrypted data, you really NEED to base dedup on
MACs and NOT hashes or you leak and risk bruteforce decryption of
plaintext by hash brute-forcing the non-encrypted dedup tokens.

Adam

On Wed, Oct 03, 2012 at 03:41:27PM +0200, Eugen Leitl wrote:

- Forwarded message from Sašo Kiselkov skiselkov...@gmail.com -

From: Sašo Kiselkov skiselkov...@gmail.com
Date: Wed, 03 Oct 2012 15:39:39 +0200
To: z...@lists.illumos.org
CC: Eugen Leitl eu...@leitl.org
Subject: Re: [cryptography] [zfs] SHA-3 winner announced
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 
Thunderbird/7.0.1

Well, it's somewhat difficult to respond to cross-posted e-mails, but
here goes:

On 10/03/2012 03:15 PM, Eugen Leitl wrote:

- Forwarded message from Adam Back a...@cypherspace.org -

From: Adam Back a...@cypherspace.org
Date: Wed, 3 Oct 2012 13:25:06 +0100
To: Eugen Leitl eu...@leitl.org
Cc: cryptography@randombit.net, Adam Back a...@cypherspace.org
Subject: Re: [cryptography] [zfs] SHA-3 winner announced
User-Agent: Mutt/1.5.21 (2010-09-15)

(comment to Saso's email forwarded by Eugen):

Well I think it would be fairer to say SHA-3 was initiatied more in the
direction of improving on the state of art in security of hash algorithms
[snip]
In that you see the selection of Keecak, focusing more on its high security
margin, and new defenses against existing known types of attacks.


At no point did I claim that the NIST people chose badly. I always said
that NIST's requirements need not align perfectly with ZFS' requirements.


If the price of that is slower, so be it - while fast primitives are very
useful, having things like MD5 full break and SHA-1 significant weakening
take the security protocols industry by surprise is also highly undesirable
and expensive to fix. To some extent for the short/mid term almost
unfixable given the state of software update, and firmware update realities.


Except in ZFS, where it's a simple zfs set command. Remember, Illumos'
ZFS doesn't use the hash as a security feature at all - that property is
not the prime focus.


So while I am someone who pays attention to protocol, algorithm and
implementation efficiency, I am happy with Keecak.


ZFS is not a security protocol, therefore the security margin of the
hash is next to irrelevant. Now that is not to say that it's entirely
pointless - it's good to have some security there, just for the added
peace of mind, but it's crazy to focus on it as primary concern.


And CPUs are geting
faster all the time, the Q3 2013 ivybridge (22nm) intel i7 next year is
going to be available in 12-core (24 hyperthreads) with 30GB cache.  Just
chuck another core at if if you have problems. ARMs are also coming out in
more cores.


Aaah, the good old but CPUs are getting faster every day! argument. So
should people hold off for a few years before purchasing new equipment
for problems they have now? And if these new super-duper CPUs are so
much higher performing, why not use a more efficient algo and push even
higher numbers with them? If I could halve my costs by simply switching
to a faster algorithm, I'd do it in a heartbeat!


And AMD 7970 GPU has 2048 cores.


Are you suggesting we run ZFS kernel code on GPUs? How about driver
issues? Or simultaneous use by graphical apps/games? Who's going to
implement and maintain this? It's easy to propose theoretical models,
but unless you plan to invest the energy in this, it'll most likely
remain purely theoretical.


For embedded and portable
use, 

Re: [cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

2012-10-03 Thread Nico Williams
On Wed, Oct 3, 2012 at 9:19 AM, Dr Adam Back a...@cypherspace.org wrote:
 Incidentally a somewhat related problem with dedup (probably more in cloud
 storage than local dedup of storage) is that the dedup function itself can
 lead to the confirmation or even decryption of documents with
 sufficiently low entropy as the attacker can induce you to store or
 directly query the dedup service looking for all possible documents.  eg say
 a form letter where the only blanks to fill in are the name (known
 suspected) and a figure (1,000,000 possible values).

 Also if there is encryption there are privacy and security leaks arising
 from doing dedup based on plaintext.

Compression at lower layers tends to leak.  We've seen this in VOIP,
and now CRIME.  Dedup is a compression function running at a lower
layer (i.e., lower than the application writing the file contents).
Of course, dedup is not a compression function that is easily applied
at the application layer, so if you really need dedup, then you need
it at lower layers.  The question is: do you need dedup and
confidentiality protection for the same data?  I think most would
answer no.

 And if you are doing dedup on ciphertext (or the data is not encrypted), you
 could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
 fact I would suggest for encrypted data, you really NEED to base dedup on
 MACs and NOT hashes or you leak and risk bruteforce decryption of
 plaintext by hash brute-forcing the non-encrypted dedup tokens.

Encrypted ZFS hashes and authenticates ciphertext.  The attacker is
presumed to observe all on-disk data, including ciphertext, block
pointers (which contain authentication tags and hashes), ...  The
attacker can observe dups as well as ZFS, and can attempt passive and
active attacks.  Dedup certainly adds to the attacker's traffic
analysis capabilities, but also to the attacker's active attack
capabilities (e.g., if the attacker can mount a chosen plaintext
attack).  Note that encrypted ZFS can only dedup within sets of
datasets that share the same keys.

What difference does it make if dedup uses an authentication tag or a
hash of ciphertext?  Assuming no collisions anyways, and if dups are
verified then collisions make little difference as far as dedup is
concerned.  I think the harm is done first by compressing and
encrypting at a layer lower than the application; encryption can be
done at lower layers, but compression is best left to the application
layer.

Nico
--
___
cryptography mailing list
cryptography@randombit.net
http://lists.randombit.net/mailman/listinfo/cryptography