from:"David Howells"

Re: [RFC PATCH] X.509: Don't check the signature on apparently self-signed keys [ver #2]

2016-01-06 Thread David Howells

Mimi Zohar  wrote:

> The x509_validate_trust() was originally added for IMA to ensure, on a
> secure boot system, a certificate chain of trust rooted in hardware.
> The IMA MOK keyring extends this certificate chain of trust to the
> running system.

The problem is that because 'trusted' is a boolean, a key in the IMA MOK
keyring will permit addition to the system keyring.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] X.509: Partially revert patch to add validation against IMA MOK keyring

2016-01-06 Thread David Howells

Partially revert commit 41c89b64d7184a780f12f2cccdabe65cb2408893:

Author: Petko Manolov <pet...@mip-labs.com>
Date:   Wed Dec 2 17:47:55 2015 +0200
IMA: create machine owner and blacklist keyrings

The problem is that prep->trusted is a simple boolean and the additional
x509_validate_trust() call doesn't therefore distinguish levels of
trustedness, but is just OR'd with the result of validation against the
system trusted keyring.

However, setting the trusted flag means that this key may be added to *any*
trusted-only keyring - including the system trusted keyring.

Whilst I appreciate what the patch is trying to do, I don't think this is
quite the right solution.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: Petko Manolov <pet...@mip-labs.com>
cc: Mimi Zohar <zo...@linux.vnet.ibm.com>
cc: keyri...@vger.kernel.org
---

 crypto/asymmetric_keys/x509_public_key.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_public_key.c 
b/crypto/asymmetric_keys/x509_public_key.c
index 9e9e5a6a9ed6..2a44b3752471 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -321,8 +321,6 @@ static int x509_key_preparse(struct key_preparsed_payload 
*prep)
goto error_free_cert;
} else if (!prep->trusted) {
ret = x509_validate_trust(cert, get_system_trusted_keyring());
-   if (ret)
-   ret = x509_validate_trust(cert, get_ima_mok_keyring());
if (!ret)
prep->trusted = 1;
}

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] X.509: Don't check the signature on apparently self-signed keys [ver #2]

2016-01-06 Thread David Howells

Mimi Zohar  wrote:

> Once the builtin keys are loaded onto the system keyring, isn't the
> system keyring locked?

No.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] X.509: Partially revert patch to add validation against IMA MOK keyring

2016-01-06 Thread David Howells

David Howells <dhowe...@redhat.com> wrote:

> Partially revert commit 41c89b64d7184a780f12f2cccdabe65cb2408893:
> 
>   Author: Petko Manolov <pet...@mip-labs.com>
>   Date:   Wed Dec 2 17:47:55 2015 +0200
>   IMA: create machine owner and blacklist keyrings
> 
> The problem is that prep->trusted is a simple boolean and the additional
> x509_validate_trust() call doesn't therefore distinguish levels of
> trustedness, but is just OR'd with the result of validation against the
> system trusted keyring.
> 
> However, setting the trusted flag means that this key may be added to *any*
> trusted-only keyring - including the system trusted keyring.
> 
> Whilst I appreciate what the patch is trying to do, I don't think this is
> quite the right solution.

Please apply this to security/next.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] X.509: Don't check the signature on apparently self-signed keys [ver #2]

2016-01-05 Thread David Howells

Mimi Zohar  wrote:

> You're missing Petko's patch:
> 41c89b6 IMA: create machine owner and blacklist keyrings

It should also be cc'd to the keyrings mailing list.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] X.509: Don't treat self-signed keys specially

2016-01-05 Thread David Howells

Trust for a self-signed certificate can normally only be determined by
whether we obtained it from a trusted location (ie. it was built into the
kernel at compile time), so there's not really any point in checking it -
we could verify that the signature is valid, but it doesn't really tell us
anything if the signature checks out.

However, there's a bug in the code determining whether a certificate is
self-signed or not - if they have neither AKID nor SKID then we just assume
that the cert is self-signed, which may not be true.

Given this, remove the code that treats self-signed certs specially when it
comes to evaluating trustability and attempt to evaluate them as ordinary
signed certificates.  We then expect self-signed certificates to fail the
trustability check and be marked as untrustworthy in x509_key_preparse().

Note that there is the possibility of the trustability check on a
self-signed cert then succeeding.  This is most likely to happen when a
duplicate of the certificate is already on the trust keyring - in which
case it shouldn't be a problem.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
cc: Mimi Zohar <zo...@linux.vnet.ibm.com>
---

 crypto/asymmetric_keys/x509_public_key.c |   25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_public_key.c 
b/crypto/asymmetric_keys/x509_public_key.c
index 2a44b3752471..26e1937af7f4 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -255,6 +255,9 @@ static int x509_validate_trust(struct x509_certificate 
*cert,
struct key *key;
int ret = 1;
 
+   if (!cert->akid_id || !cert->akid_skid)
+   return 1;
+
if (!trust_keyring)
return -EOPNOTSUPP;
 
@@ -312,17 +315,21 @@ static int x509_key_preparse(struct key_preparsed_payload 
*prep)
cert->pub->algo = pkey_algo[cert->pub->pkey_algo];
cert->pub->id_type = PKEY_ID_X509;
 
-   /* Check the signature on the key if it appears to be self-signed */
-   if ((!cert->akid_skid && !cert->akid_id) ||
-   asymmetric_key_id_same(cert->skid, cert->akid_skid) ||
-   asymmetric_key_id_same(cert->id, cert->akid_id)) {
-   ret = x509_check_signature(cert->pub, cert); /* self-signed */
-   if (ret < 0)
-   goto error_free_cert;
-   } else if (!prep->trusted) {
+   /* See if we can derive the trustability of this certificate.
+*
+* When it comes to self-signed certificates, we cannot evaluate
+* trustedness except by the fact that we obtained it from a trusted
+* location.  So we just rely on x509_validate_trust() failing in this
+* case.
+*
+* Note that there's a possibility of a self-signed cert matching a
+* cert that we have (most likely a duplicate that we already trust) -
+* in which case it will be marked trusted.
+*/
+   if (!prep->trusted) {
ret = x509_validate_trust(cert, get_system_trusted_keyring());
if (!ret)
-   prep->trusted = 1;
+   prep->trusted = true;
}
 
/* Propose a description */

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] X.509: Don't check the signature on apparently self-signed keys [ver #2]

2016-01-05 Thread David Howells

Mimi Zohar  wrote:

> You're missing Petko's patch:
> 41c89b6 IMA: create machine owner and blacklist keyrings

Hmmm...  This is wrong.  x509_key_preparse() shouldn't be polling the IMA MOK
keyring under all circumstances.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] X.509: Don't check the signature on apparently self-signed keys [ver #2]

2016-01-05 Thread David Howells

David Howells <dhowe...@redhat.com> wrote:

> If a certificate is self-signed, don't bother checking the validity of the
> signature.  The cert cannot be checked by validation against the next one
> in the chain as this is the root of the chain.  Trust for this certificate
> can only be determined by whether we obtained it from a trusted location
> (ie. it was built into the kernel at compile time).
> 
> This also fixes a bug whereby certificates were being assumed to be
> self-signed if they had neither AKID nor SKID, the symptoms of which show
> up as an attempt to load a certificate failing with -ERANGE or -EBADMSG.
> This is produced from the RSA module when the result of calculating "m =
> s^e mod n" is checked.

Oops - I forgot to change the patch description.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] X.509: Don't check the signature on apparently self-signed keys

2016-01-05 Thread David Howells

If a certificate is self-signed, don't bother checking the validity of the
signature.  The cert cannot be checked by validation against the next one
in the chain as this is the root of the chain.  Trust for this certificate
can only be determined by whether we obtained it from a trusted location
(ie. it was built into the kernel at compile time).

This also fixes a bug whereby certificates were being assumed to be
self-signed if they had neither AKID not SKID, the symptoms of which show
up as an attempt to load a certificate failing with -ERANGE or -EBADMSG.
This is produced from the RSA module when the result of calculating "m =
s^e mod n" is checked.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
cc: Mimi Zohar <zo...@linux.vnet.ibm.com>
---

 crypto/asymmetric_keys/x509_public_key.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_public_key.c 
b/crypto/asymmetric_keys/x509_public_key.c
index 2a44b3752471..663624225882 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -255,6 +255,9 @@ static int x509_validate_trust(struct x509_certificate 
*cert,
struct key *key;
int ret = 1;
 
+   if (!cert->akid_id || !cert->akid_skid)
+   return 1;
+   
if (!trust_keyring)
return -EOPNOTSUPP;
 
@@ -312,13 +315,13 @@ static int x509_key_preparse(struct key_preparsed_payload 
*prep)
cert->pub->algo = pkey_algo[cert->pub->pkey_algo];
cert->pub->id_type = PKEY_ID_X509;
 
-   /* Check the signature on the key if it appears to be self-signed */
-   if ((!cert->akid_skid && !cert->akid_id) ||
-   asymmetric_key_id_same(cert->skid, cert->akid_skid) ||
+   /* See if we can derive the trustability of this certificate */
+   if (asymmetric_key_id_same(cert->skid, cert->akid_skid) ||
asymmetric_key_id_same(cert->id, cert->akid_id)) {
-   ret = x509_check_signature(cert->pub, cert); /* self-signed */
-   if (ret < 0)
-   goto error_free_cert;
+   /* Self-signed.  We cannot evaluate the trustedness of this
+* cert, except by the fact that we obtained it from a trusted
+* location.
+*/
} else if (!prep->trusted) {
ret = x509_validate_trust(cert, get_system_trusted_keyring());
if (!ret)

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] X.509: Fix time handling

2016-01-04 Thread David Howells


Here's a set of patches that fix X.509 time handling in three ways:

 (1) Fix leap year handling.

 (2) Add leap second handling (where you get a time of 23:59:60).

 (3) Add end-of-day midnight encoding (where you get a time of 24:00:00).

David
---
David Howells (4):
  X.509: Fix leap year handling again
  Handle ISO 8601 leap seconds and encodings of midnight in mktime64()
  X.509: Support leap seconds
  X.509: Handle midnight alternative notation in GeneralizedTime


 crypto/asymmetric_keys/x509_cert_parser.c |   12 ++--
 kernel/time/time.c|9 -
 2 files changed, 14 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/4] Handle ISO 8601 leap seconds and encodings of midnight in mktime64()

2016-01-04 Thread David Howells

Handle the following ISO 8601 features in mktime64():

 (1) Leap seconds.

 Leap seconds are indicated by the seconds parameter being the value
 60.  Handle this by treating it the same as 00 of the following
 minute.

 (2) Alternate encodings of midnight.

 Two different encodings of midnight are permitted - 00:00:00 and
 24:00:00 - the first is midnight today and the second is midnight
 tomorrow and is exactly equivalent to the first with tomorrow's date.

As it happens, we don't actually need to change mktime64() to handle either
of these - just comment them as valid parameters.

These facility will be used by the X.509 parser.  Doing it in mktime64()
makes the policy common to the whole kernel and easier to find.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: Arnd Bergmann <a...@arndb.de>
cc: John Stultz <john.stu...@linaro.org>
---

 kernel/time/time.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 86751c68e08d..be115b020d27 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -322,6 +322,13 @@ EXPORT_SYMBOL(timespec_trunc);
  * -year/100+year/400 terms, and add 10.]
  *
  * This algorithm was first published by Gauss (I think).
+ *
+ * A leap second can be indicated by calling this function with sec as
+ * 60 (allowable under ISO 8601).  The leap second is treated the same
+ * as the following second since they don't exist in UNIX time.
+ *
+ * An encoding of midnight at the end of the day as 24:00:00 - ie. midnight
+ * tomorrow - (allowable under ISO 8601) is supported.
  */
 time64_t mktime64(const unsigned int year0, const unsigned int mon0,
const unsigned int day, const unsigned int hour,
@@ -338,7 +345,7 @@ time64_t mktime64(const unsigned int year0, const unsigned 
int mon0,
return time64_t)
  (year/4 - year/100 + year/400 + 367*mon/12 + day) +
  year*365 - 719499
-   )*24 + hour /* now have hours */
+   )*24 + hour /* now have hours - midnight tomorrow handled here */
  )*60 + min /* now have minutes */
)*60 + sec; /* finally seconds */
 }

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 4/4] X.509: Handle midnight alternative notation in GeneralizedTime

2016-01-04 Thread David Howells

The ASN.1 GeneralizedTime object carries an ISO 8601 format date and time.
The time is permitted to show midnight as 00:00 or 24:00 (the latter being
equivalent of 00:00 of the following day).

The permitted value is checked in x509_decode_time() but the actual
handling is left to mktime64().

Without this patch, certain X.509 certificates will be rejected and could
lead to an unbootable kernel.

Note that with this patch we also permit any 24:mm:ss time and extend this
to UTCTime, which whilst not strictly correct don't permit much leeway in
fiddling date strings.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
cc: John Stultz <john.stu...@linaro.org>
cc: Arnd Bergmann <a...@arndb.de>
---

 crypto/asymmetric_keys/x509_cert_parser.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 3379c0ba3988..70ed0852fdb2 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -548,7 +548,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
}
 
if (day < 1 || day > mon_len ||
-   hour > 23 ||
+   hour > 24 || /* ISO 8601 permits 24:00:00 as midnight tomorrow */
min > 59 ||
sec > 60) /* ISO 8601 permits leap seconds [X.680 46.3] */
goto invalid_time;

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/4] X.509: Support leap seconds

2016-01-04 Thread David Howells

The format of ASN.1 GeneralizedTime seems to be specified by ISO 8601
[X.680 46.3] and this apparently supports leap seconds (ie. the seconds
field is 60).  It's not entirely clear that ASN.1 expects it, but we can
relax the seconds check slightly for GeneralizedTime.

This results in us passing a time with sec as 60 to mktime64(), which
handles it as being a duplicate of the 0th second of the next minute.

We can't really do otherwise without giving the kernel much greater
knowledge of where all the leap seconds are.  Unfortunately, this would
require change the mapping of the kernel's current-time-in-seconds.

UTCTime, however, only supports a seconds value in the range 00-59, but for
the sake of simplicity allow this with UTCTime also.

Without this patch, certain X.509 certificates will be rejected,
potentially making a kernel unbootable.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
cc: Arnd Bergmann <a...@arndb.de>
cc: David Woodhouse <david.woodho...@intel.com>
cc: John Stultz <john.stu...@linaro.org>
---

 crypto/asymmetric_keys/x509_cert_parser.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 13c4e5a5fe8c..3379c0ba3988 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -550,7 +550,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (day < 1 || day > mon_len ||
hour > 23 ||
min > 59 ||
-   sec > 59)
+   sec > 60) /* ISO 8601 permits leap seconds [X.680 46.3] */
goto invalid_time;
 
*_t = mktime64(year, mon, day, hour, min, sec);

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] X.509: Fix determination of self-signedness

2015-12-18 Thread David Howells

Josh Boyer  wrote:

> Should this also be Cc'd to stable?

Argh.  Probably.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] Keys fixes

2015-12-18 Thread David Howells

Linus Torvalds <torva...@linux-foundation.org> wrote:

> > David Howells (7):
> >   Handle leap seconds in mktime64()
> 
> This one is completely wrong.
> 
> Leap seconds are inserted *at* the minute, not at the secodn before the 
> minute.
> 
> So this code:
> 
> +   /* Handle leap seconds */
> +   if (sec == 60)
> +   sec = 59;
> 
> is just complete crap. Making the whole commit bogus and wrong.

I did ask on ksummit-discuss beforehand.  The advice was to treat hh:mm:60 as
hh:mm:59 rather than hh:mm+1:00.  Unless we actually support leap seconds as
distinct time_t values, it has to be one or the other.

> The code did the right thing wrt leap seconds before, without having
> any magical and incorrect special case. That commit makes it instead
> have two seconds of xx:xx:59.

... as opposed to two seconds of xx:xx+1:00.  You can argue it either way -
and arguably both are equally wrong since neither maps correctly to reality.

> The fact that people add extra code to make things extra wrong is
> annoying. The patch is marked as being cc'd to John Stultz, but I
> assume it was never acked, because I doubt he would ack something like
> this.
>
> To make things worse, this whole series seems to have existed for less
> than one day, and then it was sent to me as a pull request, however
> buggy and non-acked it was.

I only asked James to pass the CVE-labelled commit on to you and didn't
include it in a patch series.  The rest I posted hoping for reviews.

> To make things EVEN *more* broken, this crap was marked for stable.

It will theoretically need to end up there anyway, since it is technically
possible for the bugs to prevent a kernel from booting - just not very likely.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] Keys fixes

2015-12-18 Thread David Howells

Linus Torvalds  wrote:

> Side note: the key handling extra checks seem pretty pointless too.

Except that it has been argued that they have to be there or someone can use
dates that contribute to the signature to fake a signed content.  Admittedly
being able to have a seconds=60 value in somewhere that should stop at 59
doesn't allow a lot of contribution...

> There's no reason to have those "some time formats allow 60 seconds,
> some don't".

Feel free to explain that to the people who drafted the ASN.1 standards.
Maybe they'll listen to you...

> And you know what? If somebody decides that they want to have a key
> that says it was done at some nonsensical time like 24:30:60, just let
> it go. Just accept it. It's not your problem.

I've been told that it's a security hole.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] X.509: Support leap seconds

2015-12-17 Thread David Howells

The format of ASN.1 GeneralizedTime seems to be specified by ISO 8601
[X.680 46.3] and this apparently supports leap seconds (ie. the seconds
field is 60).  It's not entirely clear that ASN.1 expects it, but we can
relax the seconds check slightly for GeneralizedTime.

This, however, results in us passing a time with sec as 60 to mktime64()
which, unpatched, doesn't really handle such things.  What it will do is
equate the 60th second of a minute to the 0th second of the next minute.

We can't really do otherwise without giving the kernel much greater
knowledge of where all the leap seconds are.  Unfortunately, this would
require change the mapping of the kernel's current-time-in-seconds.

UTCTime, however, only supports a seconds value in the range 00-59.

Without this patch, certain X.509 certificates will be rejected,
potentially making a kernel unbootable.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
cc: John Stultz <john.stu...@linaro.org>
cc: Arnd Bergmann <a...@arndb.de>
cc: sta...@vger.kernel.org
---

 crypto/asymmetric_keys/x509_cert_parser.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 13c4e5a5fe8c..9be2caebc57b 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -497,7 +497,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
static const unsigned char month_lengths[] = { 31, 28, 31, 30, 31, 30,
   31, 31, 30, 31, 30, 31 };
const unsigned char *p = value;
-   unsigned year, mon, day, hour, min, sec, mon_len;
+   unsigned year, mon, day, hour, min, sec, mon_len, max_sec;
 
 #define dec2bin(X) ({ unsigned char x = (X) - '0'; if (x > 9) goto 
invalid_time; x; })
 #define DD2bin(P) ({ unsigned x = dec2bin(P[0]) * 10 + dec2bin(P[1]); P += 2; 
x; })
@@ -511,6 +511,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
year += 1900;
else
year += 2000;
+   max_sec = 59;
} else if (tag == ASN1_GENTIM) {
/* GenTime: MMDDHHMMSSZ */
if (vlen != 15)
@@ -518,6 +519,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
year = DD2bin(p) * 100 + DD2bin(p);
if (year >= 1950 && year <= 2049)
goto invalid_time;
+   max_sec = 60; /* ISO 8601 permits leap seconds [X.680 46.3] */
} else {
goto unsupported_time;
}
@@ -550,7 +552,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (day < 1 || day > mon_len ||
hour > 23 ||
min > 59 ||
-   sec > 59)
+   sec > max_sec)
goto invalid_time;
 
*_t = mktime64(year, mon, day, hour, min, sec);

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] X.509: Fix leap year handling again

2015-12-17 Thread David Howells

There are still a couple of minor issues in the X.509 leap year handling:

 (1) To avoid doing a modulus-by-400 in addition to a modulus-by-100 when
 determining whether the year is a leap year or not, I divided the year
 by 100 after doing the modulus-by-100, thereby letting the compiler do
 one instruction for both, and then did a modulus-by-4.

 Unfortunately, I then passed the now-modified year value to mktime64()
 to construct a time value.

 Since this isn't a fast path and since mktime64() does a bunch of
 divisions, just condense down to "% 400".  It's also easier to read.

 (2) The default month length for any February where the year doesn't
 divide by four exactly is obtained from the month_length[] array where
 the value is 29, not 28.

 This is fixed by altering the table.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
Acked-By: David Woodhouse <david.woodho...@intel.com>
cc: sta...@vger.kernel.org
---

 crypto/asymmetric_keys/x509_cert_parser.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 021d39c0ba75..13c4e5a5fe8c 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -494,7 +494,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
 unsigned char tag,
 const unsigned char *value, size_t vlen)
 {
-   static const unsigned char month_lengths[] = { 31, 29, 31, 30, 31, 30,
+   static const unsigned char month_lengths[] = { 31, 28, 31, 30, 31, 30,
   31, 31, 30, 31, 30, 31 };
const unsigned char *p = value;
unsigned year, mon, day, hour, min, sec, mon_len;
@@ -540,9 +540,9 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (year % 4 == 0) {
mon_len = 29;
if (year % 100 == 0) {
-   year /= 100;
-   if (year % 4 != 0)
-   mon_len = 28;
+   mon_len = 28;
+   if (year % 400 == 0)
+   mon_len = 29;
}
}
}

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] Handle leap seconds in mktime64()

2015-12-17 Thread David Howells

Handle leap seconds in mktime64() - where the seconds parameter is the
value 60 - by treating it the same as 59.

This facility will be used by the X.509 parser.  Doing it in mktime64()
makes the policy common to the whole kernel and easier to find.

Whilst we're at it, remove the const markers from all the parameters since
they don't really achieve anything and we do need to alter the sec
parameter.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: John Stultz <john.stu...@linaro.org>
cc: Arnd Bergmann <a...@arndb.de>
cc: sta...@vger.kernel.org
---

 include/linux/time.h |   13 ++---
 kernel/time/time.c   |   14 +++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index beebe3a02d43..35384f0c0aa2 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -39,17 +39,16 @@ static inline int timeval_compare(const struct timeval 
*lhs, const struct timeva
return lhs->tv_usec - rhs->tv_usec;
 }
 
-extern time64_t mktime64(const unsigned int year, const unsigned int mon,
-   const unsigned int day, const unsigned int hour,
-   const unsigned int min, const unsigned int sec);
+extern time64_t mktime64(unsigned int year, unsigned int mon,
+unsigned int day, unsigned int hour,
+unsigned int min, unsigned int sec);
 
 /**
  * Deprecated. Use mktime64().
  */
-static inline unsigned long mktime(const unsigned int year,
-   const unsigned int mon, const unsigned int day,
-   const unsigned int hour, const unsigned int min,
-   const unsigned int sec)
+static inline unsigned long mktime(unsigned int year, unsigned int mon,
+  unsigned int day, unsigned int hour,
+  unsigned int min, unsigned int sec)
 {
return mktime64(year, mon, day, hour, min, sec);
 }
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 86751c68e08d..1858b10602f5 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -322,10 +322,14 @@ EXPORT_SYMBOL(timespec_trunc);
  * -year/100+year/400 terms, and add 10.]
  *
  * This algorithm was first published by Gauss (I think).
+ *
+ * A leap second can be indicated by calling this function with sec as
+ * 60 (allowable under ISO 8601).  The leap second is treated the same
+ * as the preceding second since they don't exist in UNIX time.
  */
-time64_t mktime64(const unsigned int year0, const unsigned int mon0,
-   const unsigned int day, const unsigned int hour,
-   const unsigned int min, const unsigned int sec)
+time64_t mktime64(unsigned int year0, unsigned int mon0,
+ unsigned int day, unsigned int hour,
+ unsigned int min, unsigned int sec)
 {
unsigned int mon = mon0, year = year0;
 
@@ -335,6 +339,10 @@ time64_t mktime64(const unsigned int year0, const unsigned 
int mon0,
year -= 1;
}
 
+   /* Handle leap seconds */
+   if (sec == 60)
+   sec = 59;
+
return time64_t)
  (year/4 - year/100 + year/400 + 367*mon/12 + day) +
  year*365 - 719499

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] X.509: Fix determination of self-signedness

2015-12-17 Thread David Howells

Fix determination of whether an X.509 certificate is self-signed or not.

It is currently assumed that a cert is self-signed if has no
authorityKeyIdentifier or the authorityKeyIdentifier matches the
subjectKeyIdentifier.  However, it is possible to encounter a certificate
that has neither AKID not SKID but is not self-signed.

This symptoms of this show up as an attempt to load a certificate failing
with -ERANGE or -EBADMSG, produced from the RSA module when the result of
calculating "m = s^e mod n" is checked.

To fix this, don't check to see if a certificate is self-signed if the
Issuer and Subject names differ.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
---

 crypto/asymmetric_keys/x509_public_key.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_public_key.c 
b/crypto/asymmetric_keys/x509_public_key.c
index 2a44b3752471..6236e7996f19 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -313,9 +313,14 @@ static int x509_key_preparse(struct key_preparsed_payload 
*prep)
cert->pub->id_type = PKEY_ID_X509;
 
/* Check the signature on the key if it appears to be self-signed */
-   if ((!cert->akid_skid && !cert->akid_id) ||
-   asymmetric_key_id_same(cert->skid, cert->akid_skid) ||
-   asymmetric_key_id_same(cert->id, cert->akid_id)) {
+   if ((!cert->akid_skid && !cert->akid_id)) {
+   if (cert->raw_issuer_size == cert->raw_subject_size &&
+   memcmp(cert->raw_issuer, cert->raw_subject,
+  cert->raw_subject_size) == 0)
+   goto self_signed;
+   } else if (asymmetric_key_id_same(cert->skid, cert->akid_skid) ||
+  asymmetric_key_id_same(cert->id, cert->akid_id)) {
+self_signed:
ret = x509_check_signature(cert->pub, cert); /* self-signed */
if (ret < 0)
goto error_free_cert;

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] Handle both ISO 8601 encodings of midnight in mktime64()

2015-12-17 Thread David Howells

ISO 8601 format dates permit two different encodings of midnight - 00:00:00
and 24:00:00 - the first is midnight today and the second is midnight
tomorrow and is exactly equivalent to the first with tomorrow's date.

Note that the implementation of mktime64() doesn't actually need to be
changed to handle this - the multiplication by 3600 of the hour will take
care of it automatically.  However, we should document that this handling
is done in mktime64() and is thus in a common place in the kernel.

This handling is required for X.509 certificate parsing which can be given
ISO 8601 dates.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: John Stultz <john.stu...@linaro.org>
cc: Arnd Bergmann <a...@arndb.de>
cc: sta...@vger.kernel.org
---

 kernel/time/time.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 1858b10602f5..56e7ada38471 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -326,6 +326,9 @@ EXPORT_SYMBOL(timespec_trunc);
  * A leap second can be indicated by calling this function with sec as
  * 60 (allowable under ISO 8601).  The leap second is treated the same
  * as the preceding second since they don't exist in UNIX time.
+ *
+ * An encoding of midnight at the end of the day as 24:00:00 - ie. midnight
+ * tomorrow - (allowable under ISO 8601) is supported.
  */
 time64_t mktime64(unsigned int year0, unsigned int mon0,
  unsigned int day, unsigned int hour,
@@ -346,7 +349,7 @@ time64_t mktime64(unsigned int year0, unsigned int mon0,
return time64_t)
  (year/4 - year/100 + year/400 + 367*mon/12 + day) +
  year*365 - 719499
-   )*24 + hour /* now have hours */
+   )*24 + hour /* now have hours - midnight tomorrow handled here */
  )*60 + min /* now have minutes */
)*60 + sec; /* finally seconds */
 }

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] X.509: Handle midnight alternative notation in GeneralizedTime

2015-12-17 Thread David Howells

The ASN.1 GeneralizedTime object carries an ISO8601 format date and time.
The time is permitted to show midnight as 00:00 or 24:00 (the latter being
equivalent of 00:00 of the following day).

The permitted value is checked in x509_decode_time() but the actual
handling is left to mktime64().

Without this patch, certain X.509 certificates will be rejected and could
lead to an unbootable kernel.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
cc: David Woodhouse <david.woodho...@intel.com>
cc: John Stultz <john.stu...@linaro.org>
cc: Arnd Bergmann <a...@arndb.de>
cc: sta...@vger.kernel.org
---

 crypto/asymmetric_keys/x509_cert_parser.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 9be2caebc57b..b9de251c419c 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -497,7 +497,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
static const unsigned char month_lengths[] = { 31, 28, 31, 30, 31, 30,
   31, 31, 30, 31, 30, 31 };
const unsigned char *p = value;
-   unsigned year, mon, day, hour, min, sec, mon_len, max_sec;
+   unsigned year, mon, day, hour, min, sec, mon_len, max_sec, max_hour;
 
 #define dec2bin(X) ({ unsigned char x = (X) - '0'; if (x > 9) goto 
invalid_time; x; })
 #define DD2bin(P) ({ unsigned x = dec2bin(P[0]) * 10 + dec2bin(P[1]); P += 2; 
x; })
@@ -512,6 +512,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
else
year += 2000;
max_sec = 59;
+   max_hour = 23;
} else if (tag == ASN1_GENTIM) {
/* GenTime: MMDDHHMMSSZ */
if (vlen != 15)
@@ -520,6 +521,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (year >= 1950 && year <= 2049)
goto invalid_time;
max_sec = 60; /* ISO 8601 permits leap seconds [X.680 46.3] */
+   max_hour = 24;
} else {
goto unsupported_time;
}
@@ -550,11 +552,17 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
}
 
if (day < 1 || day > mon_len ||
-   hour > 23 ||
+   hour > max_hour ||
min > 59 ||
sec > max_sec)
goto invalid_time;
 
+   /* GeneralizedTime, encoded as ISO 8601, also permits 24:00 today as an
+* alternative for 00:00 tomorrow.
+*/
+   if (hour == 24 && (min != 0 || sec != 0))
+   goto invalid_time;
+
*_t = mktime64(year, mon, day, hour, min, sec);
return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] X.509: Fix time handling

2015-12-17 Thread David Howells


Here's a set of patches that fix X.509 time handling in three ways:

 (1) Fix leap year handling.

 (2) Add leap second handling (where you get a time of 23:59:60).

 (3) Add end-of-day midnight encoding (where you get a time of 24:00:00).

David
---
David Howells (5):
  X.509: Fix leap year handling again
  Handle leap seconds in mktime64()
  X.509: Support leap seconds
  Handle both ISO 8601 encodings of midnight in mktime64()
  X.509: Handle midnight alternative notation in GeneralizedTime


 crypto/asymmetric_keys/x509_cert_parser.c |   24 +---
 include/linux/time.h  |   13 ++---
 kernel/time/time.c|   19 +++
 3 files changed, 38 insertions(+), 18 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KEYS: Fix race between read and revoke

2015-12-17 Thread David Howells

This fixes CVE-2015-7550.

There's a race between keyctl_read() and keyctl_revoke().  If the revoke
happens between keyctl_read() checking the validity of a key and the key's
semaphore being taken, then the key type read method will see a revoked key.

This causes a problem for the user-defined key type because it assumes in
its read method that there will always be a payload in a non-revoked key
and doesn't check for a NULL pointer.

Fix this by making keyctl_read() check the validity of a key after taking
semaphore instead of before.

I think the bug was introduced with the original keyrings code.

This was discovered by a multithreaded test program generated by syzkaller
(http://github.com/google/syzkaller).  Here's a cleaned up version:

#include 
#include 
#include 
void *thr0(void *arg)
{
key_serial_t key = (unsigned long)arg;
keyctl_revoke(key);
return 0;
}
void *thr1(void *arg)
{
key_serial_t key = (unsigned long)arg;
char buffer[16];
keyctl_read(key, buffer, 16);
return 0;
}
int main()
{
key_serial_t key = add_key("user", "%", "foo", 3, 
KEY_SPEC_USER_KEYRING);
pthread_t th[5];
pthread_create([0], 0, thr0, (void *)(unsigned long)key);
pthread_create([1], 0, thr1, (void *)(unsigned long)key);
pthread_create([2], 0, thr0, (void *)(unsigned long)key);
pthread_create([3], 0, thr1, (void *)(unsigned long)key);
pthread_join(th[0], 0);
pthread_join(th[1], 0);
pthread_join(th[2], 0);
pthread_join(th[3], 0);
return 0;
}

Build as:

cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread

Run as:

while keyctl-race; do :; done

as it may need several iterations to crash the kernel.  The crash can be
summarised as:

BUG: unable to handle kernel NULL pointer dereference at 
0010
IP: [] user_read+0x56/0xa3
...
Call Trace:
 [] keyctl_read_key+0xb6/0xd7
 [] SyS_keyctl+0x83/0xe0
 [] entry_SYSCALL_64_fastpath+0x12/0x6f

Reported-by: Dmitry Vyukov <dvyu...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
Tested-by: Dmitry Vyukov <dvyu...@google.com>
Cc: sta...@vger.kernel.org
---

 security/keys/keyctl.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index fb111eafcb89..1c3872aeed14 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -751,16 +751,16 @@ long keyctl_read_key(key_serial_t keyid, char __user 
*buffer, size_t buflen)
 
/* the key is probably readable - now try to read it */
 can_read_key:
-   ret = key_validate(key);
-   if (ret == 0) {
-   ret = -EOPNOTSUPP;
-   if (key->type->read) {
-   /* read the data with the semaphore held (since we
-* might sleep) */
-   down_read(>sem);
+   ret = -EOPNOTSUPP;
+   if (key->type->read) {
+   /* Read the data with the semaphore held (since we might sleep)
+* to protect against the key being updated or revoked.
+*/
+   down_read(>sem);
+   ret = key_validate(key);
+   if (ret == 0)
ret = key->type->read(key, buffer, buflen);
-   up_read(>sem);
-   }
+   up_read(>sem);
}
 
 error2:

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] X.509: Fix the time validation [ver #3]

2015-12-11 Thread David Howells

Greg Kroah-Hartman  wrote:

> David, any reason you didn't put a cc: stable in the commit for it to be
> picked up in the stable releases?

I did cc it to stable.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] X.509: Fix leap year handling again and support leap seconds

2015-12-10 Thread David Howells

Rudolf Polzer  wrote:

> Also, while at it - apparently hour 24 is allowed by ISO 8601 too as long as
> minutes and seconds are zero, leading to even more non-canonicality... can
> you check whether this is also valid ASN.1 then?

Sorry, I missed this bit.  The ASN.1 spec says that GeneralizedTime is ISO
8601 format.

> > It's not entirely clear that ASN.1 expects it, but we can relax the
> > seconds check slightly for GeneralizedTime.

What I'm not sure of is whether other ASN.1 implementations will expect it.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] security: clarify that some code is really non-modular

2015-12-10 Thread David Howells

Paul Gortmaker  wrote:

> Paul Gortmaker (2):
>   security/keys: make big_key.c explicitly non-modular
>   security/integrity: make ima/ima_mok.c explicitly non-modular

Note that I only see patch 1.  Note also that keyri...@linux-nfs.org should
now be keyri...@vger.kernel.org.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: keyring timestamps

2015-12-01 Thread David Howells

Petko Manolov  wrote:

>   0) does keyrings keep a timestamp when created or last updated?  David?

No.

> 0) is crucial.  If there is no such thing as "time of the last update" for
> keyrings i guess we'll either have to implement it or use another mechanism
> to get similar result.

You haven't said why you want it?  Update what?

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] X.509: Fix leap year handling again and support leap seconds

2015-12-01 Thread David Howells

There are still a couple of minor issues in the X.509 leap year handling:

 (1) To avoid doing a modulus-by-400 in addition to a modulus-by-100 when
 determining whether the year is a leap year or not, I divided the year
 by 100 after doing the modulus-by-100, thereby letting the compiler do
 one instruction for both, and then did a modulus-by-4.

 Unfortunately, I then passed the now-modified year value to mktime64()
 to construct a time value.

 Since this isn't a fast path and since mktime64() does a bunch of
 divisions, just condense down to "% 400".  It's also easier to read.

 (2) The default month length for any February where the year doesn't
 divide by four exactly is obtained from the month_length[] array where
 the value is 29, not 28.

 This is fixed by altering the table.

In addition:

 (3) The format of ASN.1 GeneralizedTime seems to be specified by ISO 8601
 [X.680 46.3] and this apparently supports leap seconds (ie. the
 seconds field is 60).  It's not entirely clear that ASN.1 expects it,
 but we can relax the seconds check slightly for GeneralizedTime.

 UTCTime, however, only supports a seconds value in the range 00-59.

Reported-by: Rudolf Polzer <rpol...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---
diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 021d39c0ba75..f57c3c1b5ae7 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -494,10 +494,10 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
 unsigned char tag,
 const unsigned char *value, size_t vlen)
 {
-   static const unsigned char month_lengths[] = { 31, 29, 31, 30, 31, 30,
+   static const unsigned char month_lengths[] = { 31, 28, 31, 30, 31, 30,
   31, 31, 30, 31, 30, 31 };
const unsigned char *p = value;
-   unsigned year, mon, day, hour, min, sec, mon_len;
+   unsigned year, mon, day, hour, min, sec, mon_len, sec_len;
 
 #define dec2bin(X) ({ unsigned char x = (X) - '0'; if (x > 9) goto 
invalid_time; x; })
 #define DD2bin(P) ({ unsigned x = dec2bin(P[0]) * 10 + dec2bin(P[1]); P += 2; 
x; })
@@ -511,6 +511,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
year += 1900;
else
year += 2000;
+   max_sec = 59;
} else if (tag == ASN1_GENTIM) {
/* GenTime: MMDDHHMMSSZ */
if (vlen != 15)
@@ -518,6 +519,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
year = DD2bin(p) * 100 + DD2bin(p);
if (year >= 1950 && year <= 2049)
goto invalid_time;
+   max_sec = 60; /* ISO 8601 permits leap seconds [X.680 46.3] */
} else {
goto unsupported_time;
}
@@ -540,9 +542,9 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (year % 4 == 0) {
mon_len = 29;
if (year % 100 == 0) {
-   year /= 100;
-   if (year % 4 != 0)
-   mon_len = 28;
+   mon_len = 28;
+   if (year % 400 == 0)
+   mon_len = 29;
}
}
}
@@ -550,7 +552,7 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (day < 1 || day > mon_len ||
hour > 23 ||
min > 59 ||
-   sec > 59)
+   sec > max_sec)
goto invalid_time;
 
*_t = mktime64(year, mon, day, hour, min, sec);
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] X.509: Fix leap year handling again and support leap seconds

2015-12-01 Thread David Howells

Rudolf Polzer  wrote:

> the leap second support still looks a bit suspect, as mktime64 will convert
> mm/dd/ HH/MM/60 and mm/dd/ HH/MM+1/00 to the same time64_t,
> essentially meaning that two different inputs can yield the same output,
> possibly violating ASN.1 CER and DER rules.

That's a 'bug' in mktime64() not my parsing of the ASN.1.  If it's valid ASN.1
then we should accept it.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KEYS: Fix handling of stored error in a negatively instantiated user key

2015-11-25 Thread David Howells

James Morris  wrote:

> Is this triggerable by normal users?

Yes.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KEYS: Fix handling of stored error in a negatively instantiated user key

2015-11-24 Thread David Howells

If a user key gets negatively instantiated, an error code is cached in the
payload area.  A negatively instantiated key may be then be positively
instantiated by updating it with valid data.  However, the ->update key
type method must be aware that the error code may be there.

The following may be used to trigger the bug in the user key type:

keyctl request2 user user "" @u
keyctl add user user "a" @u

which manifests itself as:

BUG: unable to handle kernel paging request at ff8a
IP: [] __call_rcu.constprop.76+0x1f/0x280 
kernel/rcu/tree.c:3046
PGD 7cc30067 PUD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
01/01/2011
task: 88003ddea700 ti: 88003dd88000 task.ti: 88003dd88000
RIP: 0010:[]  [] 
__call_rcu.constprop.76+0x1f/0x280
 [] __call_rcu.constprop.76+0x1f/0x280 
kernel/rcu/tree.c:3046
RSP: 0018:88003dd8bdb0  EFLAGS: 00010246
RAX: ff82 RBX:  RCX: 0001
RDX: 81e3fe40 RSI:  RDI: ff82
RBP: 88003dd8bde0 R08: 88007d2d2da0 R09: 
R10:  R11: 88003e8073c0 R12: ff82
R13: 88003dd8be68 R14: 88007d027600 R15: 88003ddea700
FS:  00b92880(0063) GS:88007fd0() 
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: ff8a CR3: 7cc5f000 CR4: 06e0
Stack:
 88003dd8bdf0 81160a8a  ff82
 88003dd8be68 88007d027600 88003dd8bdf0 810a39e5
 88003dd8be20 812a31ab 88007d027600 88007d027620
Call Trace:
 [] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
 [] user_update+0x8b/0xb0 
security/keys/user_defined.c:129
 [< inline >] __key_update security/keys/key.c:730
 [] key_create_or_update+0x291/0x440 
security/keys/key.c:908
 [< inline >] SYSC_add_key security/keys/keyctl.c:125
 [] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
 [] entry_SYSCALL_64_fastpath+0x12/0x6a 
arch/x86/entry/entry_64.S:185

Note the error code (-ENOKEY) in EDX.

A similar bug can be tripped by:

keyctl request2 trusted user "" @u
keyctl add trusted user "a" @u

This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.

Reported-by: Dmitry Vyukov <dvyu...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
Acked-by: Mimi Zohar <zo...@linux.vnet.ibm.com>
---

 security/keys/encrypted-keys/encrypted.c |2 ++
 security/keys/trusted.c  |5 -
 security/keys/user_defined.c |5 -
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/security/keys/encrypted-keys/encrypted.c 
b/security/keys/encrypted-keys/encrypted.c
index 927db9f35ad6..696ccfa08d10 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -845,6 +845,8 @@ static int encrypted_update(struct key *key, struct 
key_preparsed_payload *prep)
size_t datalen = prep->datalen;
int ret = 0;
 
+   if (test_bit(KEY_FLAG_NEGATIVE, >flags))
+   return -ENOKEY;
if (datalen <= 0 || datalen > 32767 || !prep->data)
return -EINVAL;
 
diff --git a/security/keys/trusted.c b/security/keys/trusted.c
index 903dace648a1..16dec53184b6 100644
--- a/security/keys/trusted.c
+++ b/security/keys/trusted.c
@@ -1007,13 +1007,16 @@ static void trusted_rcu_free(struct rcu_head *rcu)
  */
 static int trusted_update(struct key *key, struct key_preparsed_payload *prep)
 {
-   struct trusted_key_payload *p = key->payload.data[0];
+   struct trusted_key_payload *p;
struct trusted_key_payload *new_p;
struct trusted_key_options *new_o;
size_t datalen = prep->datalen;
char *datablob;
int ret = 0;
 
+   if (test_bit(KEY_FLAG_NEGATIVE, >flags))
+   return -ENOKEY;
+   p = key->payload.data[0];
if (!p->migratable)
return -EPERM;
if (datalen <= 0 || datalen > 32767 || !prep->data)
diff --git a/security/keys/user_defined.c b/security/keys/user_defined.c
index 28cb30f80256..8705d79b2c6f 100644
--- a/security/keys/user_defined.c
+++ b/security/keys/user_defined.c
@@ -120,7 +120,10 @@ int user_update(struct key *key, struct 
key_preparsed_payload *prep)
 
if (ret == 0) {
/* attach the new data, displacing the old */
-   zap = k

Re: [PATCH] KEYS: Fix handling of stored error in a negatively instantiated user key

2015-11-24 Thread David Howells

Hi James,

Can this be passed straight to Linus please?

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] readlink()-related oddities

2015-11-20 Thread David Howells

Al Viro  wrote:

> All of them?  I see two kinds there - one is magical symlink (recognized
> by contents in afs_iget()), another is this autocell thing, the latter
> having no ->readlink().  Both serve as automount points, don't they?

The "autocell" thing is where you don't have an AFS file of that name and
lookup of that non-existent file as an attempt to mount a destination volume
encoded by the filename.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] X.509: Fix the time validation [ver #3]

2015-11-12 Thread David Howells

This fixes CVE-2015-5327.  It affects kernels from 4.3-rc1 onwards.

Fix the X.509 time validation to use month number-1 when looking up the
number of days in that month.  Also put the month number validation before
doing the lookup so as not to risk overrunning the array.

This can be tested by doing the following:

cat <
Signed-off-by: David Howells <dhowe...@redhat.com>
Tested-by: Mimi Zohar <zo...@linux.vnet.ibm.com>
Acked-by: David Woodhouse <david.woodho...@intel.com>
---

 crypto/asymmetric_keys/x509_cert_parser.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index 3000ea3b6687..021d39c0ba75 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -531,7 +531,11 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (*p != 'Z')
goto unsupported_time;
 
-   mon_len = month_lengths[mon];
+   if (year < 1970 ||
+   mon < 1 || mon > 12)
+   goto invalid_time;
+
+   mon_len = month_lengths[mon - 1];
if (mon == 2) {
if (year % 4 == 0) {
mon_len = 29;
@@ -543,14 +547,12 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
}
}
 
-   if (year < 1970 ||
-   mon < 1 || mon > 12 ||
-   day < 1 || day > mon_len ||
+   if (day < 1 || day > mon_len ||
hour > 23 ||
min > 59 ||
sec > 59)
goto invalid_time;
-   
+
*_t = mktime64(year, mon, day, hour, min, sec);
return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] X.509: Fix the time validation

2015-11-11 Thread David Howells

This fixes CVE-2015-5327.  It affects kernels from 4.3-rc1 onwards.

Fix the X.509 time validation to use month number-1 when looking up the
number of days in that month.  Also put the month number validation before
doing the lookup so as not to risk overrunning the array.

This can be tested by doing the following:

cat <
Signed-off-by: David Howells <dhowe...@redhat.com>
Tested-by: Mimi Zohar <zo...@linux.vnet.ibm.com>
Acked-by: David Woodhouse <david.woodho...@intel.com>
---

 crypto/asymmetric_keys/x509_cert_parser.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
b/crypto/asymmetric_keys/x509_cert_parser.c
index af71878dc15b..e8d7b0342f5f 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -531,7 +531,11 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
if (*p != 'Z')
goto unsupported_time;
 
-   mon_len = month_lengths[mon];
+   if (year < 1970 ||
+   mon < 1 || mon > 12)
+   goto invalid_time;
+
+   mon_len = month_lengths[mon - 1];
if (mon == 2) {
if (year % 4 == 0) {
mon_len = 29;
@@ -543,14 +547,12 @@ int x509_decode_time(time64_t *_t,  size_t hdrlen,
}
}
 
-   if (year < 1970 ||
-   mon < 1 || mon > 12 ||
-   day < 1 || day > mon_len ||
+   if (day < 1 || day > mon_len ||
hour < 0 || hour > 23 ||
min < 0 || min > 59 ||
sec < 0 || sec > 59)
goto invalid_time;
-   
+
*_t = mktime64(year, mon, day, hour, min, sec);
return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] KEYS: Move the point of trust determination to __key_link()

2015-10-21 Thread David Howells

Move the point at which a key is determined to be trustworthy to
__key_link() so that we use the contents of the keyring being linked in to
to determine whether the key being linked in is trusted or not.

What is 'trusted' then becomes a matter of what's in the keyring.

Currently, the test is done when the key is parsed, but given that at that
point we can only sensibly refer to the contents of the system trusted
keyring, we can only use that as the basis for working out the
trustworthiness of a new key.

With this change, a trusted keyring is a set of keys that once the
trusted-only flag is set cannot be added to except by verification through
one of the contained keys.

Further, adding a key into a trusted keyring, whilst it might grant
trustworthiness in the context of that keyring, does not automatically
grant trustworthiness in the context of a second keyring to which it could
be secondarily linked.

To accomplish this, the authentication data associated with the key source
must now be retained.  For an X.509 cert, this means the contents of the
AuthorityKeyIdentifier and the signature data.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 certs/system_keyring.c|3 +
 crypto/asymmetric_keys/Makefile   |2 -
 crypto/asymmetric_keys/asymmetric_keys.h  |2 +
 crypto/asymmetric_keys/asymmetric_type.c  |   15 +
 crypto/asymmetric_keys/pkcs7_trust.c  |   22 +++
 crypto/asymmetric_keys/public_key.c   |   19 ++
 crypto/asymmetric_keys/public_key.h   |6 ++
 crypto/asymmetric_keys/public_key_trust.c |   94 +
 crypto/asymmetric_keys/x509_parser.h  |6 --
 crypto/asymmetric_keys/x509_public_key.c  |6 --
 include/crypto/public_key.h   |8 +-
 include/keys/asymmetric-subtype.h |4 +
 security/integrity/digsig_asymmetric.c|5 +-
 13 files changed, 108 insertions(+), 84 deletions(-)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index e7f286413276..fbaaaea59f02 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -35,7 +35,8 @@ static __init int system_trusted_keyring_init(void)
keyring_alloc(".system_keyring",
  KUIDT_INIT(0), KGIDT_INIT(0), current_cred(),
  ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
- KEY_USR_VIEW | KEY_USR_READ | KEY_USR_SEARCH),
+  KEY_USR_VIEW | KEY_USR_READ | KEY_USR_SEARCH |
+  KEY_USR_WRITE),
  KEY_ALLOC_NOT_IN_QUOTA, NULL);
if (IS_ERR(system_trusted_keyring))
panic("Can't allocate system trusted keyring\n");
diff --git a/crypto/asymmetric_keys/Makefile b/crypto/asymmetric_keys/Makefile
index bd07987c64e7..69bcdc9a2ce6 100644
--- a/crypto/asymmetric_keys/Makefile
+++ b/crypto/asymmetric_keys/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += asymmetric_keys.o
 
 asymmetric_keys-y := asymmetric_type.o signature.o
 
-obj-$(CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE) += public_key.o
+obj-$(CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE) += public_key.o public_key_trust.o
 obj-$(CONFIG_PUBLIC_KEY_ALGO_RSA) += rsa.o
 
 #
diff --git a/crypto/asymmetric_keys/asymmetric_keys.h 
b/crypto/asymmetric_keys/asymmetric_keys.h
index 1d450b580245..ca8e9ac34ce6 100644
--- a/crypto/asymmetric_keys/asymmetric_keys.h
+++ b/crypto/asymmetric_keys/asymmetric_keys.h
@@ -9,6 +9,8 @@
  * 2 of the Licence, or (at your option) any later version.
  */
 
+#include 
+
 extern struct asymmetric_key_id *asymmetric_key_hex_to_key_id(const char *id);
 
 extern int __asymmetric_key_hex_to_key_id(const char *id,
diff --git a/crypto/asymmetric_keys/asymmetric_type.c 
b/crypto/asymmetric_keys/asymmetric_type.c
index a79d30128821..e02cbd068151 100644
--- a/crypto/asymmetric_keys/asymmetric_type.c
+++ b/crypto/asymmetric_keys/asymmetric_type.c
@@ -362,10 +362,25 @@ static void asymmetric_key_destroy(struct key *key)
asymmetric_key_free_kids(kids);
 }
 
+/*
+ * Verify the trust on an asymmetric key when added to a trusted-only keyring.
+ * The keyring provides a list of keys to check against.
+ */
+static int asymmetric_key_verify_trust(const union key_payload *payload,
+  struct key *keyring)
+{
+   struct asymmetric_key_subtype *subtype = payload->data[asym_subtype];
+
+   pr_devel("==>%s()\n", __func__);
+
+   return subtype->verify_trust(payload, keyring);
+}
+
 struct key_type key_type_asymmetric = {
.name   = "asymmetric",
.preparse   = asymmetric_key_preparse,
.free_preparse  = asymmetric_key_free_preparse,
+   .verify_trust   = asymmetric_key_verify_trust,
.instantiate= generic_key_instantiate,
.match_preparse = asymmetric_key_match_preparse,
.match_free = asymmetric_key_ma

[PATCH 6/6] KEYS: Merge the type-specific data with the payload data

2015-10-21 Thread David Howells

Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.

Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.

Signed-off-by: David Howells <dhowe...@redhat.com>
cc: linux-c...@vger.kernel.org
cc: ecryp...@vger.kernel.org
cc: linux-e...@vger.kernel.org
cc: linux-f2fs-de...@lists.sourceforge.net
cc: linux-...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: linux-ima-de...@lists.sourceforge.net
---

 Documentation/crypto/asymmetric-keys.txt |   27 +++--
 Documentation/security/keys.txt  |   41 ---
 crypto/asymmetric_keys/asymmetric_keys.h |5 --
 crypto/asymmetric_keys/asymmetric_type.c |   44 -
 crypto/asymmetric_keys/public_key.c  |4 +-
 crypto/asymmetric_keys/signature.c   |2 -
 crypto/asymmetric_keys/x509_parser.h |1 
 crypto/asymmetric_keys/x509_public_key.c |9 ++--
 fs/cifs/cifs_spnego.c|6 +--
 fs/cifs/cifsacl.c|   25 ++--
 fs/cifs/connect.c|9 ++--
 fs/cifs/sess.c   |2 -
 fs/cifs/smb2pdu.c|2 -
 fs/ecryptfs/ecryptfs_kernel.h|5 +-
 fs/ext4/crypto_key.c |4 +-
 fs/f2fs/crypto_key.c |4 +-
 fs/fscache/object-list.c |4 +-
 fs/nfs/nfs4idmap.c   |4 +-
 include/crypto/public_key.h  |1 
 include/keys/asymmetric-subtype.h|2 -
 include/keys/asymmetric-type.h   |   15 +++
 include/keys/user-type.h |8 
 include/linux/key-type.h |3 -
 include/linux/key.h  |   33 +++
 kernel/module_signing.c  |1 
 lib/digsig.c |7 ++-
 net/ceph/ceph_common.c   |2 -
 net/ceph/crypto.c|6 +--
 net/dns_resolver/dns_key.c   |   20 +
 net/dns_resolver/dns_query.c |7 +--
 net/dns_resolver/internal.h  |8 
 net/rxrpc/af_rxrpc.c |2 -
 net/rxrpc/ar-key.c   |   32 +++
 net/rxrpc/ar-output.c|2 -
 net/rxrpc/ar-security.c  |4 +-
 net/rxrpc/rxkad.c|   16 ---
 security/integrity/evm/evm_crypto.c  |2 -
 security/keys/big_key.c  |   47 +++---
 security/keys/encrypted-keys/encrypted.c |   18 
 security/keys/encrypted-keys/encrypted.h |4 +-
 security/keys/encrypted-keys/masterkey_trusted.c |4 +-
 security/keys/key.c  |   18 
 security/keys/keyctl.c   |4 +-
 security/keys/keyring.c  |   12 +++---
 security/keys/process_keys.c |4 +-
 security/keys/request_key.c  |4 +-
 security/keys/request_key_auth.c |   12 +++---
 security/keys/trusted.c  |6 +--
 security/keys/user_defined.c |   14 +++
 49 files changed, 286 insertions(+), 230 deletions(-)

diff --git a/Documentation/crypto/asymmetric-keys.txt 
b/Documentation/crypto/asymmetric-keys.txt
index b7675904a747..8c07e0ea6bc0 100644
--- a/Documentation/crypto/asymmetric-keys.txt
+++ b/Documentation/crypto/asymmetric-keys.txt
@@ -186,7 +186,7 @@ and looks like the following:
const struct public_key_signature *sig);
};
 
-Asymmetric keys point to this with their type_data[0] member.
+Asymmetric keys point to this with their payload[asym_subtype] member.
 
 The owner and name fields should be set to the owning module and the name of
 the subtype.  Currently, the name is only used for print statements.
@@ -269,8 +269,7 @@ mandatory:
 
struct key_preparsed_payload {
char*description;
-   void*type_data[2];
-   void*payload;
+   void*payload[4];
const void  *data;
size_t  datalen;
size_t  quotalen;
@@ -283,16 +282,18 @@ mandatory:
  not theirs.
 
  If the parser is happy with the blob, it should propose a description for
- the key and attach it to ->description, ->type_data[0] should be set to
- point to the subtype to be used, ->payload should be set to point to the
- i

Re: [PATCH 1/6] KEYS: use kvfree() in add_key

2015-10-21 Thread David Howells

These patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-next

And tagged with:

keys-next-20151021

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/10] KEYS: Change how keys are determined to be trusted

2015-10-21 Thread David Howells


Here's a set of patches that changes how keys are determined to be trusted
- currently, that's a case of whether a key has KEY_FLAG_TRUSTED set upon
it.  A keyring can then have a flag set (KEY_FLAG_TRUSTED ONLY) that
indicates that only keys with this flag set may be added to that keyring.

Further, any time an X.509 certificate is instantiated without this flag
set, the certificate is judged against the contents of the system trusted
keyring to determine whether KEY_FLAG_TRUSTED should be set upon it.

With these patches, KEY_FLAG_TRUSTED is removed.  The kernel may add
implicitly trusted keys to a trusted-only keyring by asserting
KEY_ALLOC_TRUSTED when the key is created, but otherwise the key will only
be allowed to be added to the keyring if it can be verified by a key
already in that keyring.  The system trusted keyring is not then special in
this sense and other trusted keyrings can be set up that are wholly
independent of it.

To make this work, we have to retain sufficient data from the X.509
certificate that we can then verify the signature at need.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-trust

and are tagged with:

keys-trust-20151021

David
---
David Howells (10):
  KEYS: Generalise system_verify_data() to provide access to internal 
content
  PKCS#7: Make trust determination dependent on contents of trust keyring
  KEYS: Add facility to check key trustworthiness upon link creation
  KEYS: Allow authentication data to be stored in an asymmetric key
  KEYS: Add identifier pointers to public_key_signature struct
  X.509: Retain the key verification data
  X.509: Extract signature digest and make self-signed cert checks earlier
  PKCS#7: Make the signature a pointer rather than embedding it
  X.509: Move the trust validation code out to its own file
  KEYS: Move the point of trust determination to __key_link()


 Documentation/security/keys.txt   |   17 ++
 arch/x86/kernel/kexec-bzimage64.c |   18 --
 certs/system_keyring.c|   49 +++--
 crypto/asymmetric_keys/Kconfig|1 
 crypto/asymmetric_keys/Makefile   |4 
 crypto/asymmetric_keys/asymmetric_keys.h  |2 
 crypto/asymmetric_keys/asymmetric_type.c  |   22 ++
 crypto/asymmetric_keys/mscode_parser.c|   21 +-
 crypto/asymmetric_keys/pkcs7_key_type.c   |   64 +++---
 crypto/asymmetric_keys/pkcs7_parser.c |   59 +++--
 crypto/asymmetric_keys/pkcs7_parser.h |   11 -
 crypto/asymmetric_keys/pkcs7_trust.c  |   44 ++--
 crypto/asymmetric_keys/pkcs7_verify.c |  108 --
 crypto/asymmetric_keys/public_key.c   |   43 
 crypto/asymmetric_keys/public_key.h   |6 +
 crypto/asymmetric_keys/public_key_trust.c |  180 +
 crypto/asymmetric_keys/verify_pefile.c|   40 +---
 crypto/asymmetric_keys/verify_pefile.h|5 
 crypto/asymmetric_keys/x509_cert_parser.c |   53 +++--
 crypto/asymmetric_keys/x509_parser.h  |   12 -
 crypto/asymmetric_keys/x509_public_key.c  |  312 +
 include/crypto/pkcs7.h|6 -
 include/crypto/public_key.h   |   28 +--
 include/keys/asymmetric-subtype.h |6 -
 include/keys/asymmetric-type.h|8 -
 include/keys/system_keyring.h |7 -
 include/linux/key-type.h  |   10 +
 include/linux/key.h   |   12 +
 include/linux/verification.h  |   49 +
 include/linux/verify_pefile.h |   22 --
 kernel/module_signing.c   |5 
 security/integrity/digsig_asymmetric.c|5 
 security/keys/key.c   |   44 +++-
 security/keys/keyring.c   |   18 +-
 34 files changed, 735 insertions(+), 556 deletions(-)
 create mode 100644 crypto/asymmetric_keys/public_key_trust.c
 create mode 100644 include/linux/verification.h
 delete mode 100644 include/linux/verify_pefile.h

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/10] KEYS: Add facility to check key trustworthiness upon link creation

2015-10-21 Thread David Howells

Add a facility whereby if KEY_FLAG_TRUSTED_ONLY is set on the destination
keyring, the creation of a link to a candidate key will cause the
trustworthiness of that key to be evaluated against the already present
contents of that keyring.  This affects operations like add_key(),
KEYCTL_LINK and KEYCTL_INSTANTIATE.

To this end:

 (1) A new key type method is provided:

int (*verify_trust)(const union key_payload *payload,
struct key *keyring);

 This is implemented by key types for which verification of one key by
 another is appropriate.  It is primarily intended for use with the
 asymmetric key type.

 When called, it is given the payload or prospective payload[*] of the
 candidate key to verify and a pointer to the destination keyring.  The
 method is expected to search the keying for an appropriate key with
 which to verify the candidate.

 [*] If called during add_key(), preparse is called before this method,
 but a key isn't actually allocated unless the verification is
 successful.

 (2) KEY_FLAG_TRUSTED is removed.  A key is now trusted by virtue of being
 contained in the trusted-only keyring being searched.

 (3) KEY_ALLOC_TRUSTED now acts as an override.  If this is passed to
 key_create_or_update() then the ->verify_trust() method will be
 ignored and the key will be added anyway.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 Documentation/security/keys.txt  |   17 
 crypto/asymmetric_keys/x509_public_key.c |6 ++--
 include/linux/key-type.h |   10 ++-
 include/linux/key.h  |   12 +---
 security/keys/key.c  |   44 --
 security/keys/keyring.c  |   18 +++-
 6 files changed, 87 insertions(+), 20 deletions(-)

diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt
index 8c183873b2b7..e7f3447ccd1b 100644
--- a/Documentation/security/keys.txt
+++ b/Documentation/security/keys.txt
@@ -1183,6 +1183,23 @@ The structure has a number of fields, some of which are 
mandatory:
  successfully, even if instantiate() or update() succeed.
 
 
+ (*) int (*verify_trust)(const union key_payload *payload, struct key 
*keyring);
+
+ If the keyring to which a candidate key is being added/linked is marked as
+ KEY_FLAG_TRUSTED_ONLY then this function will get called in the candidate
+ key type to verify the key or proposed key based on its payload.  It is
+ expected to use the contents of the supplied destination keyring to
+ determine whether the candidate key is to be trusted and added to the
+ keyring.
+
+ The method should return 0 to allow the addition and an error otherwise,
+ typically ENOKEY if there's no key in the keyring to verify this key and
+ EKEYREJECTED if the selected key fails to verify the candidate.
+
+ This method is optional.  If it is not supplied, keys of this type cannot
+ be added to trusted-only keyrings and EPERM will be returned.
+
+
  (*) int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
 
  This method is called to attach a payload to a key during construction.
diff --git a/crypto/asymmetric_keys/x509_public_key.c 
b/crypto/asymmetric_keys/x509_public_key.c
index 64d42981a8d7..76c211b31da7 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -318,10 +318,10 @@ static int x509_key_preparse(struct key_preparsed_payload 
*prep)
ret = x509_check_signature(cert->pub, cert); /* self-signed */
if (ret < 0)
goto error_free_cert;
-   } else if (!prep->trusted) {
+   } else {
ret = x509_validate_trust(cert, get_system_trusted_keyring());
-   if (!ret)
-   prep->trusted = 1;
+   if (ret == -EKEYREJECTED)
+   goto error_free_cert;
}
 
/* Propose a description */
diff --git a/include/linux/key-type.h b/include/linux/key-type.h
index 7463355a198b..5d7cf5e7f8c6 100644
--- a/include/linux/key-type.h
+++ b/include/linux/key-type.h
@@ -45,7 +45,6 @@ struct key_preparsed_payload {
size_t  datalen;/* Raw datalen */
size_t  quotalen;   /* Quota length for proposed payload */
time_t  expiry; /* Expiry time of key */
-   booltrusted;/* True if key is trusted */
 };
 
 typedef int (*request_key_actor_t)(struct key_construction *key,
@@ -95,6 +94,15 @@ struct key_type {
 */
void (*free_preparse)(struct key_preparsed_payload *prep);
 
+   /* Verify the trust on a key when added to a trusted-only keyring.
+*
+* If this method isn't provided then it is assumed that the concept of
+* trust is irrelevant

[PATCH 02/10] PKCS#7: Make trust determination dependent on contents of trust keyring

2015-10-21 Thread David Howells

Make the determination of the trustworthiness of a key dependent on whether
a key that can verify it is present in the ring of trusted keys rather than
whether or not the verifying key has KEY_FLAG_TRUSTED set.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 certs/system_keyring.c  |   13 -
 crypto/asymmetric_keys/pkcs7_key_type.c |2 +-
 crypto/asymmetric_keys/pkcs7_parser.h   |1 -
 crypto/asymmetric_keys/pkcs7_trust.c|   16 +++-
 crypto/asymmetric_keys/verify_pefile.c  |2 +-
 crypto/asymmetric_keys/x509_parser.h|1 -
 include/crypto/pkcs7.h  |3 +--
 include/linux/verification.h|1 -
 kernel/module_signing.c |2 +-
 9 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index cf55bd3a072a..e7f286413276 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -121,7 +121,6 @@ late_initcall(load_system_certificate_list);
 int verify_pkcs7_signature(const void *data, size_t len,
   const void *raw_pkcs7, size_t pkcs7_len,
   struct key *trusted_keys,
-  int untrusted_error,
   enum key_being_used_for usage,
   int (*view_content)(void *ctx,
   const void *data, size_t len,
@@ -129,7 +128,6 @@ int verify_pkcs7_signature(const void *data, size_t len,
   void *ctx)
 {
struct pkcs7_message *pkcs7;
-   bool trusted;
int ret;
 
pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
@@ -149,13 +147,10 @@ int verify_pkcs7_signature(const void *data, size_t len,
 
if (!trusted_keys)
trusted_keys = system_trusted_keyring;
-   ret = pkcs7_validate_trust(pkcs7, trusted_keys, );
-   if (ret < 0)
-   goto error;
-
-   if (!trusted && untrusted_error) {
-   pr_err("PKCS#7 signature not signed with a trusted key\n");
-   ret = untrusted_error;
+   ret = pkcs7_validate_trust(pkcs7, trusted_keys);
+   if (ret < 0) {
+   if (ret == -ENOKEY)
+   pr_err("PKCS#7 signature not signed with a trusted 
key\n");
goto error;
}
 
diff --git a/crypto/asymmetric_keys/pkcs7_key_type.c 
b/crypto/asymmetric_keys/pkcs7_key_type.c
index 240a5303ebb7..89b75477868d 100644
--- a/crypto/asymmetric_keys/pkcs7_key_type.c
+++ b/crypto/asymmetric_keys/pkcs7_key_type.c
@@ -71,7 +71,7 @@ static int pkcs7_preparse(struct key_preparsed_payload *prep)
 
ret = verify_pkcs7_signature(NULL, 0,
 prep->data, prep->datalen,
-NULL, -ENOKEY, usage,
+NULL, usage,
 pkcs7_view_content, prep);
 
kleave(" = %d", ret);
diff --git a/crypto/asymmetric_keys/pkcs7_parser.h 
b/crypto/asymmetric_keys/pkcs7_parser.h
index a66b19ebcf47..c8159983ed8f 100644
--- a/crypto/asymmetric_keys/pkcs7_parser.h
+++ b/crypto/asymmetric_keys/pkcs7_parser.h
@@ -22,7 +22,6 @@ struct pkcs7_signed_info {
struct pkcs7_signed_info *next;
struct x509_certificate *signer; /* Signing certificate (in msg->certs) 
*/
unsignedindex;
-   booltrusted;
boolunsupported_crypto; /* T if not usable due to 
missing crypto */
 
/* Message digest - the digest of the Content Data (or NULL) */
diff --git a/crypto/asymmetric_keys/pkcs7_trust.c 
b/crypto/asymmetric_keys/pkcs7_trust.c
index 90d6d47965b0..388007fed3b2 100644
--- a/crypto/asymmetric_keys/pkcs7_trust.c
+++ b/crypto/asymmetric_keys/pkcs7_trust.c
@@ -30,7 +30,6 @@ static int pkcs7_validate_trust_one(struct pkcs7_message 
*pkcs7,
struct public_key_signature *sig = >sig;
struct x509_certificate *x509, *last = NULL, *p;
struct key *key;
-   bool trusted;
int ret;
 
kenter(",%u,", sinfo->index);
@@ -42,10 +41,8 @@ static int pkcs7_validate_trust_one(struct pkcs7_message 
*pkcs7,
 
for (x509 = sinfo->signer; x509; x509 = x509->signer) {
if (x509->seen) {
-   if (x509->verified) {
-   trusted = x509->trusted;
+   if (x509->verified)
goto verified;
-   }
kleave(" = -ENOKEY [cached]");
return -ENOKEY;
}
@@ -122,7 +119,6 @@ static int pkcs7_validate_trust_one(struct pkcs7_message 
*pkcs7,
 
 matched:
ret = verify_signature(key, sig);
-   trusted = test_bit(KEY_FLAG_TRUSTED, >flags);
key_put(key);
if

[PATCH 01/10] KEYS: Generalise system_verify_data() to provide access to internal content

2015-10-21 Thread David Howells

Generalise system_verify_data() to provide access to internal content
through a callback.  This allows all the PKCS#7 stuff to be hidden inside
this function and removed from the PE file parser and the PKCS#7 test key.

If external content is not required, NULL should be passed as data to the
function.  If the callback is not required, that can be set to NULL.

The function is now called verify_pkcs7_signature() to contrast with
verify_pefile_signature() and the definitions of both have been moved into
linux/verification.h along with the key_being_used_for enum.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 arch/x86/kernel/kexec-bzimage64.c   |   18 ++---
 certs/system_keyring.c  |   45 +-
 crypto/asymmetric_keys/Kconfig  |1 
 crypto/asymmetric_keys/mscode_parser.c  |   21 +++---
 crypto/asymmetric_keys/pkcs7_key_type.c |   64 +++
 crypto/asymmetric_keys/pkcs7_parser.c   |   21 +-
 crypto/asymmetric_keys/verify_pefile.c  |   40 ---
 crypto/asymmetric_keys/verify_pefile.h  |5 +-
 include/crypto/pkcs7.h  |3 +
 include/crypto/public_key.h |   14 ---
 include/keys/asymmetric-type.h  |1 
 include/keys/system_keyring.h   |7 ---
 include/linux/verification.h|   50 
 include/linux/verify_pefile.h   |   22 ---
 kernel/module_signing.c |5 +-
 15 files changed, 156 insertions(+), 161 deletions(-)
 create mode 100644 include/linux/verification.h
 delete mode 100644 include/linux/verify_pefile.h

diff --git a/arch/x86/kernel/kexec-bzimage64.c 
b/arch/x86/kernel/kexec-bzimage64.c
index 0f8a6bbaaa44..0b5da62eb203 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -19,8 +19,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 
 #include 
 #include 
@@ -529,18 +528,9 @@ static int bzImage64_cleanup(void *loader_data)
 #ifdef CONFIG_KEXEC_BZIMAGE_VERIFY_SIG
 static int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len)
 {
-   bool trusted;
-   int ret;
-
-   ret = verify_pefile_signature(kernel, kernel_len,
- system_trusted_keyring,
- VERIFYING_KEXEC_PE_SIGNATURE,
- );
-   if (ret < 0)
-   return ret;
-   if (!trusted)
-   return -EKEYREJECTED;
-   return 0;
+   return verify_pefile_signature(kernel, kernel_len,
+  NULL,
+  VERIFYING_KEXEC_PE_SIGNATURE);
 }
 #endif
 
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 2570598b784d..cf55bd3a072a 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -108,16 +108,25 @@ late_initcall(load_system_certificate_list);
 #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
 
 /**
- * Verify a PKCS#7-based signature on system data.
- * @data: The data to be verified.
+ * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * @data: The data to be verified (NULL if expecting internal data).
  * @len: Size of @data.
  * @raw_pkcs7: The PKCS#7 message that is the signature.
  * @pkcs7_len: The size of @raw_pkcs7.
+ * @trusted_keys: Trusted keys to use (NULL for system_trusted_keyring).
  * @usage: The use to which the key is being put.
+ * @view_content: Callback to gain access to content.
+ * @ctx: Context for callback.
  */
-int system_verify_data(const void *data, unsigned long len,
-  const void *raw_pkcs7, size_t pkcs7_len,
-  enum key_being_used_for usage)
+int verify_pkcs7_signature(const void *data, size_t len,
+  const void *raw_pkcs7, size_t pkcs7_len,
+  struct key *trusted_keys,
+  int untrusted_error,
+  enum key_being_used_for usage,
+  int (*view_content)(void *ctx,
+  const void *data, size_t len,
+  size_t asn1hdrlen),
+  void *ctx)
 {
struct pkcs7_message *pkcs7;
bool trusted;
@@ -128,7 +137,7 @@ int system_verify_data(const void *data, unsigned long len,
return PTR_ERR(pkcs7);
 
/* The data should be detached - so we need to supply it. */
-   if (pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
+   if (data && pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
pr_err("PKCS#7 signature with non-detached data\n");
ret = -EBADMSG;
goto error;
@@ -138,13 +147,29 @@ int system_verify_data(const void *data, unsigned long 
len,
if (ret < 0)
goto error;
 
-

[PATCH 3/6] keys: Be more consistent in selection of union members used

2015-10-21 Thread David Howells

From: Insu Yun <wuni...@gmail.com>

key->description and key->index_key.description are same because
they are unioned. But, for readability, using same name for
duplication and validation seems better.

Signed-off-by: Insu Yun <wuni...@gmail.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 security/keys/key.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/key.c b/security/keys/key.c
index aee2ec5a18fc..c0478465d1ac 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -278,7 +278,7 @@ struct key *key_alloc(struct key_type *type, const char 
*desc,
 
key->index_key.desc_len = desclen;
key->index_key.description = kmemdup(desc, desclen + 1, GFP_KERNEL);
-   if (!key->description)
+   if (!key->index_key.description)
goto no_memory_3;
 
atomic_set(>usage, 1);

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] KEYS: Provide a script to extract the sys cert list from a vmlinux file

2015-10-21 Thread David Howells

The supplied script takes a vmlinux file - and if necessary a System.map
file - locates the system certificates list and extracts it to the named
file.

Call as:

./scripts/extract-sys-certs vmlinux certs

if vmlinux contains symbols and:

./scripts/extract-sys-certs -s System.map vmlinux certs

if it does not.

It prints something like the following to stdout:

Have 27 sections
No symbols in vmlinux, trying System.map
Have 80088 symbols
Have 1346 bytes of certs at VMA 0x8201c540
Certificate list in section .init.data
Certificate list at file offset 0x141c540

If vmlinux contains symbols then that is used rather than System.map - even
if one is given.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 scripts/extract-sys-certs.pl |  144 ++
 1 file changed, 144 insertions(+)
 create mode 100755 scripts/extract-sys-certs.pl

diff --git a/scripts/extract-sys-certs.pl b/scripts/extract-sys-certs.pl
new file mode 100755
index ..d476e7d1fd88
--- /dev/null
+++ b/scripts/extract-sys-certs.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+#
+use strict;
+use Math::BigInt;
+use Fcntl "SEEK_SET";
+
+die "Format: $0 [-s ]  \n"
+if ($#ARGV != 1 && $#ARGV != 3 ||
+   $#ARGV == 3 && $ARGV[0] ne "-s");
+
+my $sysmap = "";
+if ($#ARGV == 3) {
+shift;
+$sysmap = $ARGV[0];
+shift;
+}
+
+my $vmlinux = $ARGV[0];
+my $keyring = $ARGV[1];
+
+#
+# Parse the vmlinux section table
+#
+open FD, "objdump -h $vmlinux |" || die $vmlinux;
+my @lines = ;
+close(FD) || die $vmlinux;
+
+my @sections = ();
+
+foreach my $line (@lines) {
+chomp($line);
+if ($line =~ 
/\s*([0-9]+)\s+(\S+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+2[*][*]([0-9]+)/
+   ) {
+   my $seg  = $1;
+   my $name = $2;
+   my $len  = Math::BigInt->new("0x" . $3);
+   my $vma  = Math::BigInt->new("0x" . $4);
+   my $lma  = Math::BigInt->new("0x" . $5);
+   my $foff = Math::BigInt->new("0x" . $6);
+   my $align = 2 ** $7;
+
+   push @sections, { name => $name,
+ vma => $vma,
+ len => $len,
+ foff => $foff };
+}
+}
+
+print "Have $#sections sections\n";
+
+#
+# Try and parse the vmlinux symbol table.  If the vmlinux file has been created
+# from a vmlinuz file with extract-vmlinux then the symbol table will be empty.
+#
+open FD, "nm $vmlinux 2>/dev/null |" || die $vmlinux;
+@lines = ;
+close(FD) || die $vmlinux;
+
+my %symbols = ();
+my $nr_symbols = 0;
+
+sub parse_symbols(@) {
+foreach my $line (@_) {
+   chomp($line);
+   if ($line =~ /([0-9a-f]+)\s([a-zA-Z])\s(\S+)/
+   ) {
+   my $addr = "0x" . $1;
+   my $type = $2;
+   my $name = $3;
+
+   $symbols{$name} = $addr;
+   $nr_symbols++;
+   }
+}
+}
+parse_symbols(@lines);
+
+if ($nr_symbols == 0 && $sysmap ne "") {
+print "No symbols in vmlinux, trying $sysmap\n";
+
+open FD, "<$sysmap" || die $sysmap;
+@lines = ;
+close(FD) || die $sysmap;
+parse_symbols(@lines);
+}
+
+die "No symbols available\n"
+if ($nr_symbols == 0);
+
+print "Have $nr_symbols symbols\n";
+
+die "Can't find system certificate list"
+unless (exists($symbols{"__cert_list_start"}) &&
+   exists($symbols{"__cert_list_end"}));
+
+my $start = Math::BigInt->new($symbols{"__cert_list_start"});
+my $end = Math::BigInt->new($symbols{"__cert_list_end"});
+my $size = $end - $start;
+
+printf "Have %u bytes of certs at VMA 0x%x\n", $size, $start;
+
+my $s = undef;
+foreach my $sec (@sections) {
+my $s_name = $sec->{name};
+my $s_vma = $sec->{vma};
+my $s_len = $sec->{len};
+my $s_foff = $sec->{foff};
+my $s_vend = $s_vma + $s_len;
+
+next unless ($start >= $s_vma);
+next if ($start >= $s_vend);
+
+die "Cert object partially overflows section $s_name\n"
+   if ($end > $s_vend);
+
+die "Cert object in multiple sections: ", $s_name, " and ", $s->{name}, 
"\n"
+   if ($s);
+$s = $sec;
+}
+
+die "Cert object not inside a section\n"
+unless ($s);
+
+print "Certificate list in section ", $s->{name}, "\n";
+
+my $foff = $start - $s->{vma} + $s->{foff};
+
+printf "Certificate list at file offset 0x%x\n", $foff;
+
+open FD, "<$vmlinux" || die $vmlinux;
+binmode(FD);
+die $vmlinux if (!defined(sysseek(FD, $foff, SEEK_SET)));
+my $buf = "";
+my $len = sysread(FD, $buf, $size);
+die "$vmlinux" if (!defined($len)

[PATCH 09/10] X.509: Move the trust validation code out to its own file

2015-10-21 Thread David Howells

Move the X.509 trust validation code out to its own file so that it can be
generalised.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 crypto/asymmetric_keys/Makefile   |2 
 crypto/asymmetric_keys/public_key_trust.c |  192 +
 crypto/asymmetric_keys/x509_parser.h  |6 +
 crypto/asymmetric_keys/x509_public_key.c  |  167 -
 4 files changed, 199 insertions(+), 168 deletions(-)
 create mode 100644 crypto/asymmetric_keys/public_key_trust.c

diff --git a/crypto/asymmetric_keys/Makefile b/crypto/asymmetric_keys/Makefile
index cd1406f9b14a..bd07987c64e7 100644
--- a/crypto/asymmetric_keys/Makefile
+++ b/crypto/asymmetric_keys/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_PUBLIC_KEY_ALGO_RSA) += rsa.o
 #
 # X.509 Certificate handling
 #
-obj-$(CONFIG_X509_CERTIFICATE_PARSER) += x509_key_parser.o
+obj-$(CONFIG_X509_CERTIFICATE_PARSER) += x509_key_parser.o public_key_trust.o
 x509_key_parser-y := \
x509-asn1.o \
x509_akid-asn1.o \
diff --git a/crypto/asymmetric_keys/public_key_trust.c 
b/crypto/asymmetric_keys/public_key_trust.c
new file mode 100644
index ..753a413d479b
--- /dev/null
+++ b/crypto/asymmetric_keys/public_key_trust.c
@@ -0,0 +1,192 @@
+/* Instantiate a public key crypto key from an X.509 Certificate
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "X.509: "fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "asymmetric_keys.h"
+#include "public_key.h"
+#include "x509_parser.h"
+
+static bool use_builtin_keys;
+static struct asymmetric_key_id *ca_keyid;
+
+#ifndef MODULE
+static struct {
+   struct asymmetric_key_id id;
+   unsigned char data[10];
+} cakey;
+
+static int __init ca_keys_setup(char *str)
+{
+   if (!str)   /* default system keyring */
+   return 1;
+
+   if (strncmp(str, "id:", 3) == 0) {
+   struct asymmetric_key_id *p = 
+   size_t hexlen = (strlen(str) - 3) / 2;
+   int ret;
+
+   if (hexlen == 0 || hexlen > sizeof(cakey.data)) {
+   pr_err("Missing or invalid ca_keys id\n");
+   return 1;
+   }
+
+   ret = __asymmetric_key_hex_to_key_id(str + 3, p, hexlen);
+   if (ret < 0)
+   pr_err("Unparsable ca_keys id hex string\n");
+   else
+   ca_keyid = p;   /* owner key 'id:xx' */
+   } else if (strcmp(str, "builtin") == 0) {
+   use_builtin_keys = true;
+   }
+
+   return 1;
+}
+__setup("ca_keys=", ca_keys_setup);
+#endif
+
+/**
+ * x509_request_asymmetric_key - Request a key by X.509 certificate params.
+ * @keyring: The keys to search.
+ * @id: The issuer & serialNumber to look for or NULL.
+ * @skid: The subjectKeyIdentifier to look for or NULL.
+ * @partial: Use partial match if true, exact if false.
+ *
+ * Find a key in the given keyring by identifier.  The preferred identifier is
+ * the issuer + serialNumber and the fallback identifier is the
+ * subjectKeyIdentifier.  If both are given, the lookup is by the former, but
+ * the latter must also match.
+ */
+struct key *x509_request_asymmetric_key(struct key *keyring,
+   const struct asymmetric_key_id *id,
+   const struct asymmetric_key_id *skid,
+   bool partial)
+{
+   struct key *key;
+   key_ref_t ref;
+   const char *lookup;
+   char *req, *p;
+   int len;
+
+   if (id) {
+   lookup = id->data;
+   len = id->len;
+   } else {
+   lookup = skid->data;
+   len = skid->len;
+   }
+
+   /* Construct an identifier "id:". */
+   p = req = kmalloc(2 + 1 + len * 2 + 1, GFP_KERNEL);
+   if (!req)
+   return ERR_PTR(-ENOMEM);
+
+   if (partial) {
+   *p++ = 'i';
+   *p++ = 'd';
+   } else {
+   *p++ = 'e';
+   *p++ = 'x';
+   }
+   *p++ = ':';
+   p = bin2hex(p, lookup, len);
+   *p = 0;
+
+   pr_debug("Look up: \"%s\"\n", req);
+
+   ref = keyring_search(make_key_ref(keyring, 1),
+_type_asymmetric, req);
+   if (IS_ERR(ref))
+   pr_debug("Request for key '%s' err %ld\n", r

[PATCH 07/10] X.509: Extract signature digest and make self-signed cert checks earlier

2015-10-21 Thread David Howells

Extract the signature digest for an X.509 certificate earlier, at the end
of x509_cert_parse() rather than leaving it to the callers thereof.

Further, immediately after that, check the signature on self-signed
certificates, also rather in the callers of x509_cert_parse().

This we need to determine whether or not the X.509 cert requires crypto
that we don't support before we do the above two steps.

We note in the x509_certificate struct the following bits of information:

 (1) Whether the signature is self-signed (even if we can't check the
 signature due to missing crypto).

 (2) Whether the key held in the certificate needs unsupported crypto to be
 used.  We may get a PKCS#7 message with X.509 certs that we can't make
 use of - we just ignore them and give ENOPKG at the end it we couldn't
 verify anything if at least one of these unusable certs are in the
 chain of trust.

 (3) Whether the signature held in the certificate needs unsupported crypto
 to be checked.  We can still use the key held in this certificate,
 even if we can't check the signature on it - if it is held in the
 system trusted keyring, for instance.  We just can't add it to a ring
 of trusted keys or follow it further up the chain of trust.

Making these checks earlier allows x509_check_signature() to be removed and
replaced with direct calls to public_key_verify_signature().

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 crypto/asymmetric_keys/pkcs7_verify.c |   38 ++--
 crypto/asymmetric_keys/x509_cert_parser.c |   10 ++
 crypto/asymmetric_keys/x509_parser.h  |7 +
 crypto/asymmetric_keys/x509_public_key.c  |  139 -
 4 files changed, 121 insertions(+), 73 deletions(-)

diff --git a/crypto/asymmetric_keys/pkcs7_verify.c 
b/crypto/asymmetric_keys/pkcs7_verify.c
index e225dccdf559..1dede0199673 100644
--- a/crypto/asymmetric_keys/pkcs7_verify.c
+++ b/crypto/asymmetric_keys/pkcs7_verify.c
@@ -190,9 +190,8 @@ static int pkcs7_verify_sig_chain(struct pkcs7_message 
*pkcs7,
 x509->subject,
 x509->raw_serial_size, x509->raw_serial);
x509->seen = true;
-   ret = x509_get_sig_params(x509);
-   if (ret < 0)
-   goto maybe_missing_crypto_in_x509;
+   if (x509->unsupported_key)
+   goto unsupported_crypto_in_x509;
 
pr_debug("- issuer %s\n", x509->issuer);
sig = x509->sig;
@@ -203,22 +202,14 @@ static int pkcs7_verify_sig_chain(struct pkcs7_message 
*pkcs7,
pr_debug("- authkeyid.skid %*phN\n",
 sig->auth_ids[1]->len, sig->auth_ids[1]->data);
 
-   if ((!x509->sig->auth_ids[0] && !x509->sig->auth_ids[1]) ||
-   strcmp(x509->subject, x509->issuer) == 0) {
+   if (x509->self_signed) {
/* If there's no authority certificate specified, then
 * the certificate must be self-signed and is the root
 * of the chain.  Likewise if the cert is its own
 * authority.
 */
-   pr_debug("- no auth?\n");
-   if (x509->raw_subject_size != x509->raw_issuer_size ||
-   memcmp(x509->raw_subject, x509->raw_issuer,
-  x509->raw_issuer_size) != 0)
-   return 0;
-
-   ret = x509_check_signature(x509->pub, x509);
-   if (ret < 0)
-   goto maybe_missing_crypto_in_x509;
+   if (x509->unsupported_sig)
+   goto unsupported_crypto_in_x509;
x509->signer = x509;
pr_debug("- self-signed\n");
return 0;
@@ -270,7 +261,7 @@ static int pkcs7_verify_sig_chain(struct pkcs7_message 
*pkcs7,
sinfo->index);
return 0;
}
-   ret = x509_check_signature(p->pub, x509);
+   ret = public_key_verify_signature(p->pub, p->sig);
if (ret < 0)
return ret;
x509->signer = p;
@@ -282,16 +273,14 @@ static int pkcs7_verify_sig_chain(struct pkcs7_message 
*pkcs7,
might_sleep();
}
 
-maybe_missing_crypto_in_x509:
+unsupported_crypto_in_x509:
/* Just prune the certificate chain at this point if we lack some
 * crypto module to go further.  Note, however, we don't want to set
-* sinfo->missing_crypto as the signed info block may still be
+* sinfo-&

[PATCH 08/10] PKCS#7: Make the signature a pointer rather than embedding it

2015-10-21 Thread David Howells

Point to the public_key_signature struct from the pkcs7_signed_info struct
rather than embedding it.  This makes it easier to have it take an
arbitrary number of MPIs in future.

We also save a copy of the digest in the signature without sharing the
memory with the crypto layer metadata.  This means we can use
public_key_free() to get rid of the signature record.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 crypto/asymmetric_keys/pkcs7_parser.c |   38 +++-
 crypto/asymmetric_keys/pkcs7_parser.h |   10 +++---
 crypto/asymmetric_keys/pkcs7_trust.c  |4 +--
 crypto/asymmetric_keys/pkcs7_verify.c |   52 +
 4 files changed, 56 insertions(+), 48 deletions(-)

diff --git a/crypto/asymmetric_keys/pkcs7_parser.c 
b/crypto/asymmetric_keys/pkcs7_parser.c
index 7b69783cff99..8454ae5b5aa8 100644
--- a/crypto/asymmetric_keys/pkcs7_parser.c
+++ b/crypto/asymmetric_keys/pkcs7_parser.c
@@ -44,9 +44,7 @@ struct pkcs7_parse_context {
 static void pkcs7_free_signed_info(struct pkcs7_signed_info *sinfo)
 {
if (sinfo) {
-   mpi_free(sinfo->sig.mpi[0]);
-   kfree(sinfo->sig.digest);
-   kfree(sinfo->signing_cert_id);
+   public_key_free(NULL, sinfo->sig);
kfree(sinfo);
}
 }
@@ -125,6 +123,10 @@ struct pkcs7_message *pkcs7_parse_message(const void 
*data, size_t datalen)
ctx->sinfo = kzalloc(sizeof(struct pkcs7_signed_info), GFP_KERNEL);
if (!ctx->sinfo)
goto out_no_sinfo;
+   ctx->sinfo->sig = kzalloc(sizeof(struct public_key_signature),
+ GFP_KERNEL);
+   if (!ctx->sinfo->sig)
+   goto out_no_sig;
 
ctx->data = (unsigned long)data;
ctx->ppcerts = >certs;
@@ -150,6 +152,7 @@ out:
ctx->certs = cert->next;
x509_free_certificate(cert);
}
+out_no_sig:
pkcs7_free_signed_info(ctx->sinfo);
 out_no_sinfo:
pkcs7_free_message(ctx->msg);
@@ -219,25 +222,25 @@ int pkcs7_sig_note_digest_algo(void *context, size_t 
hdrlen,
 
switch (ctx->last_oid) {
case OID_md4:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_MD4;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_MD4;
break;
case OID_md5:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_MD5;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_MD5;
break;
case OID_sha1:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_SHA1;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_SHA1;
break;
case OID_sha256:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_SHA256;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_SHA256;
break;
case OID_sha384:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_SHA384;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_SHA384;
break;
case OID_sha512:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_SHA512;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_SHA512;
break;
case OID_sha224:
-   ctx->sinfo->sig.pkey_hash_algo = HASH_ALGO_SHA224;
+   ctx->sinfo->sig->pkey_hash_algo = HASH_ALGO_SHA224;
default:
printk("Unsupported digest algo: %u\n", ctx->last_oid);
return -ENOPKG;
@@ -256,7 +259,7 @@ int pkcs7_sig_note_pkey_algo(void *context, size_t hdrlen,
 
switch (ctx->last_oid) {
case OID_rsaEncryption:
-   ctx->sinfo->sig.pkey_algo = PKEY_ALGO_RSA;
+   ctx->sinfo->sig->pkey_algo = PKEY_ALGO_RSA;
break;
default:
printk("Unsupported pkey algo: %u\n", ctx->last_oid);
@@ -617,16 +620,17 @@ int pkcs7_sig_note_signature(void *context, size_t hdrlen,
 const void *value, size_t vlen)
 {
struct pkcs7_parse_context *ctx = context;
+   struct public_key_signature *sig = ctx->sinfo->sig;
MPI mpi;
 
-   BUG_ON(ctx->sinfo->sig.pkey_algo != PKEY_ALGO_RSA);
+   BUG_ON(sig->pkey_algo != PKEY_ALGO_RSA);
 
mpi = mpi_read_raw_data(value, vlen);
if (!mpi)
return -ENOMEM;
 
-   ctx->sinfo->sig.mpi[0] = mpi;
-   ctx->sinfo->sig.nr_mpi = 1;
+   sig->mpi[0] = mpi;
+   sig->nr_mpi = 1;
return 0;
 }
 
@@ -662,12 +666,16 @@ int pkcs7_note_signed_info(void *context, size_t hdrlen,
 
pr_devel("SINFO KID: %u [%*phN]\n", kid->len, kid->len, kid->data);
 
-   sinfo->signing_cert_id = kid;
+   si

Re: [PATCH v4 2/3] Create IMA machine owner keys (MOK) and blacklist keyrings;

2015-10-21 Thread David Howells

Petko Manolov  wrote:

> > > As far as i know there is no concept of write-once to a keyring in the
> > > kernel.  David will correct me if i am wrong.  I wonder how hard would
> > > it be to add such functionality, in case it is missing?
> > 
> > Not hard, particularly if it's only an attribute that the kernel can set.
> 
> Definitely kernel-only.  The other way does not appeal to me in terms of 
> security.

Nor me in terms of letting userspace lock keys into the kernel arbitrarily.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/3] Create IMA machine owner keys (MOK) and blacklist keyrings;

2015-10-21 Thread David Howells

Mimi Zohar  wrote:

> > I need to think about this.  Should -EKEYREVOKED be the same as -ENOKEY in
> > this case?  I guess the end result is pretty much the same from IMA view
> > point, but there may be a requirement to list all revoked keys...
> 
> When checking the blacklist, getting -EKEYREVOKED is definitely
> different than -ENOKEY.

Actually, I misspoke earlier.  Revoked keys are only skipped by the search if
a live key is found.  Should all the keys in the blacklist just be revoked so
that the search of the list returns either -ENOKEY (no key there) or
-EKEYREVOKED (the key is blacklisted)?  That might be getting too
over-complicated though.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: GPF in keyring_destroy

2015-10-19 Thread David Howells

Dmitry Vyukov  wrote:

> > Does the attached patch fix it for you?
> 
> Yes, it fixes the crash for me.

Can I put you down as a Tested-by?

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KEYS: Don't permit request_key() to construct a new keyring

2015-10-19 Thread David Howells

If request_key() is used to find a keyring, only do the search part - don't
do the construction part if the keyring was not found by the search.  We
don't really want keyrings in the negative instantiated state since the
rejected/negative instantiation error value in the payload is unioned with
keyring metadata.

Now the kernel gives an error:

request_key("keyring", "#selinux,bdekeyring", "keyring", 
KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 security/keys/request_key.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 486ef6fa393b..0d6253124278 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -440,6 +440,9 @@ static struct key *construct_key_and_link(struct 
keyring_search_context *ctx,
 
kenter("");
 
+   if (ctx->index_key.type == _type_keyring)
+   return ERR_PTR(-EPERM);
+   
user = key_user_lookup(current_fsuid());
if (!user)
return ERR_PTR(-ENOMEM);

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring

2015-10-19 Thread David Howells

The following sequence of commands:

i=`keyctl add user a a @s`
keyctl request2 keyring foo bar @t
keyctl unlink $i @s

tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set.  However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code.  When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:

BUG: unable to handle kernel paging request at ff8a
IP: [] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[] keyring_destroy+0x3d/0x88
RSP: 0018:88003e2f3d30  EFLAGS: 00010203
RAX: ff82 RBX: 88003bf1a900 RCX: 
RDX:  RSI: 3bfc6901 RDI: 81a73a40
RBP: 88003e2f3d38 R08: 0152 R09: 
R10: 88003e2f3c18 R11: 865b R12: 88003bf1a900
R13:  R14: 88003bf1a908 R15: 88003e2f4000
...
CR2: ff8a CR3: 3e3ec000 CR4: 06f0
...
Call Trace:
 [] key_gc_unused_keys.constprop.1+0x5d/0x10f
 [] key_garbage_collector+0x1fa/0x351
 [] process_one_work+0x28e/0x547
 [] worker_thread+0x26e/0x361
 [] ? rescuer_thread+0x2a8/0x2a8
 [] kthread+0xf3/0xfb
 [] ? kthread_create_on_node+0x1c2/0x1c2
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x1c2/0x1c2

Note the value in RAX.  This is a 32-bit representation of -ENOKEY.

The solution is to only call ->destroy() if the key was successfully
instantiated.

Reported-by: Dmitry Vyukov <dvyu...@google.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
Tested-by: Dmitry Vyukov <dvyu...@google.com>
---

 security/keys/gc.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/keys/gc.c b/security/keys/gc.c
index 39eac1fd5706..addf060399e0 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -134,8 +134,10 @@ static noinline void key_gc_unused_keys(struct list_head 
*keys)
kdebug("- %u", key->serial);
key_check(key);
 
-   /* Throw away the key data */
-   if (key->type->destroy)
+   /* Throw away the key data if the key is instantiated */
+   if (test_bit(KEY_FLAG_INSTANTIATED, >flags) &&
+   !test_bit(KEY_FLAG_NEGATIVE, >flags) &&
+   key->type->destroy)
key->type->destroy(key);
 
security_key_free(key);

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Basic trusted keys support for TPM 2.0

2015-10-16 Thread David Howells

Hi Jarkko,

For some reason I don't see patch 1.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: GPF in keyring_destroy

2015-10-15 Thread David Howells

Dmitry Vyukov  wrote:

> RAX: ff82

This is the value that matters.  It would appear to be -ENOKEY and would be in
key->type_data.reject_error, I think.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] keys: correctly check failed allocation for kmemdup

2015-10-15 Thread David Howells

Insu Yun  wrote:

> Thanks David. Then it is not a bug.
> It's a pure question. 
> Why use different name for allocation and check?
> For me, it is quite confusing. 

Either I didn't notice at the time, or the shorter variant is the original.

If you want to give me a patch making it consistent, feel free.

David
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: GPF in keyring_destroy

2015-10-15 Thread David Howells

Does the attached patch fix it for you?

David
---
commit a7609e0bb3973d6ee3c9f1ecd0b6a382d99d6248
Author: David Howells <dhowe...@redhat.com>
Date:   Thu Oct 15 17:21:37 2015 +0100

KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring

The following sequence of commands:

i=`keyctl add user a a @s`
keyctl request2 keyring foo bar @t
keyctl unlink $i @s

tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set.  However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code.  When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:

BUG: unable to handle kernel paging request at ff8a
IP: [] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[] keyring_destroy+0x3d/0x88
RSP: 0018:88003e2f3d30  EFLAGS: 00010203
RAX: ff82 RBX: 88003bf1a900 RCX: 
RDX:  RSI: 3bfc6901 RDI: 81a73a40
RBP: 88003e2f3d38 R08: 0152 R09: 
R10: 88003e2f3c18 R11: 865b R12: 88003bf1a900
R13:  R14: 88003bf1a908 R15: 88003e2f4000
...
CR2: ff8a CR3: 3e3ec000 CR4: 06f0
...
Call Trace:
 [] key_gc_unused_keys.constprop.1+0x5d/0x10f
 [] key_garbage_collector+0x1fa/0x351
 [] process_one_work+0x28e/0x547
 [] worker_thread+0x26e/0x361
 [] ? rescuer_thread+0x2a8/0x2a8
 [] kthread+0xf3/0xfb
 [] ? kthread_create_on_node+0x1c2/0x1c2
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x1c2/0x1c2

Note the value in RAX.  This is a 32-bit representation of -ENOKEY.

The solution is to only call ->destroy() if the key was successfully
instantiated.

Reported-by: Dmitry Vyukov <dvyu...@google.com>
    Signed-off-by: David Howells <dhowe...@redhat.com>

diff --git a/security/keys/gc.c b/security/keys/gc.c
index 39eac1fd5706..addf060399e0 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -134,8 +134,10 @@ static noinline void key_gc_unused_keys(struct list_head 
*keys)
kdebug("- %u", key->serial);
key_check(key);
 
-   /* Throw away the key data */
-   if (key->type->destroy)
+   /* Throw away the key data if the key is instantiated */
+   if (test_bit(KEY_FLAG_INSTANTIATED, >flags) &&
+   !test_bit(KEY_FLAG_NEGATIVE, >flags) &&
+   key->type->destroy)
key->type->destroy(key);
 
security_key_free(key);
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

 I need to respond to this in pieces... first the bit that is bugging
 me:
 
 * two new page flags
  
  I need to keep track of two bits of per-cached-page information:
  
   (1) This page is known by the cache, and that the cache must be informed if
   the page is going to go away.
 
 I still do not understand the life cycle of this bit.  What does the
 cache do when it learns the page has gone away?

That's up to the cache.  CacheFS, for example, unpins some resources when all
the pages managed by a pointer block are taken away from it.  The cache may
also reserve a block on disk to back this page, and that reservation may then
be discarded by the netfs uncaching the page.

The cache may also speculatively take copies of the page if the machine is
idle.

Documentation/filesystems/caching/netfs-api.txt describes the caching API as a
process, including the presentation of netfs pages to the cache and their
uncaching.

 How is it informed?

[Documentation/filesystems/caching/netfs-api.txt]
==
PAGE UNCACHING
==

To uncache a page, this function should be called:

void fscache_uncache_page(struct fscache_cookie *cookie,
  struct page *page);

This function permits the cache to release any in-memory representation it
might be holding for this netfs page.  This function must be called once for
each page on which the read or write page functions above have been called to
make sure the cache's in-memory tracking information gets torn down.

Note that pages can't be explicitly deleted from the data file.  The whole
data file must be retired (see the relinquish cookie function below).

Furthermore, note that this does not cancel the asynchronous read or write
operation started by the read/alloc and write functions.
[/]

 Who owns the page cache in which such a page lives, the nfs client?
 Filesystem that hosts the page?  A third page cache owned by the
 cache itself?  (See my basic confusion about how many page cache
 levels you have, below.)

[Documentation/filesystems/caching/fscache.txt]
 (7) Data I/O is done direct to and from the netfs's pages.  The netfs
 indicates that page A is at index B of the data-file represented by cookie
 C, and that it should be read or written.  The cache backend may or may
 not start I/O on that page, but if it does, a netfs callback will be
 invoked to indicate completion.  The I/O may be either synchronous or
 asynchronous.
[/]

I should perhaps make the documentation more explicit: the pages passed to the
routines defined in include/linux/fscache.h are netfs pages, normally belonging
the pagecache of the appropriate netfs inode.  This is, however, mentioned in
the function banner comments in fscache.h.

 Suppose one were to take a mundane approach to the persistent cache
 problem instead of layering filesystems.  What you would do then is
 change NFS's -write_page and variants to fiddle the persistent
 cache

It is a requirement laid down by the Linux NFS fs maintainers that the writes
to the cache be asynchronous, even if the writes to NFS aren't.

Note further that NFS's write_page() != writing to the cache.  Writing to the
cache is typically done by NFS's readpages().

Besides, at the moment, caching is suppressed for any NFS file opened for
writing due to coherency issues.  This is something to be revisited later.

 as well as the network, instead of just the network as now.

Not as now.  See above.

 This fiddling could even consist of -write calls to another
 filesystem, though working directly with the bio interface would
 yield the fastest, and therefore to my mind, best result.

You can't necessarily access the BIO interface, and even if you can, the cache
is still a filesystem.

Essentially, what cachefiles does is to do what you say: to perform -write
calls on another filesystem.

FS-Cache also protects the netfs against (a) there being no cache, (b) the
cache suffering a fatal I/O error and (c) the cache being removed; and protects
the cache against (d) the netfs uncaching pages that the cache is using and (e)
conflicting operations from the netfs, some of which may be queued for
asynchronous processing.

FS-Cache also groups asynchronous netfs store requests together, which
hopefully, one day, I'll be able to pass on to the backing fs.

 In any case, you find out how to write the page to backing store by
 asking the filesystem, which in the naive approach would be nfs
 augmented with caching library calls.

NFS and AFS and CIFS and ISOFS, but yes, that's what fscache is, if you like, a
caching library.

 The filesystem keeps its own metadata around to know how to map the page to
 disk.  So again naively, this metadata could tell the nfs client that the
 page is not mapped to disk at all.

The netfs should _not_ know about the metadata of a backing fs.  Firstly, there
are many different potential backing filesystems, and secondly if

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-25 Thread David Howells


Daniel Phillips [EMAIL PROTECTED] wrote:

 This factor of four (even worse on XFS, not quite as bad on Ext3) is
 worth ruminating upon.  Is all of the difference explained by avoiding
 seeks on the server, which has the files in memory?

Here are some more stats for you to consider:

 (1) Copy the data across the network to a fresh Ext3 fs on the same partition
 I was using for the cache:

[EMAIL PROTECTED] ~]# time cp -a /warthog/aaa /var/fscache
real0m39.052s
user0m0.368s
sys 0m15.229s

 (2) Reboot and read back the files just written into Ext3 on the local disk:

[EMAIL PROTECTED] ~]# time tar cf - /var/fscache/aaa /dev/zero
real0m40.574s
user0m0.164s
sys 0m3.512s

 (3) Run through the cache population process, and then run a tar directly on
 cachefiles's cache directly after a reboot:

[EMAIL PROTECTED] ~]# time tar cf - /var/fscache/cache /dev/zero
real4m53.104s
user0m0.192s
sys 0m4.240s

So I guess there's a problem in cachefiles's efficiency - possibly due to the
fact that it tries to be fully asynchronous.

In case (1) this is very similar to the time for a read through a completely
cold cache (37.497s).

In case (2) this is comparable to cachefiles with a cache warmed prior to a
reboot (1m54.350s); in this case, however, cachefiles is doing some extra work:

 (a) It's doing a lookup on the server for each file, in addition to the
 lookups on the disk.  However, just doing a tar from plain NFS, the
 command completes in 22.330s.

 (b) It's reading an xattr per object for cache coherency management.

 (c) As the cache knows nothing of directories, files, etc., it lays its
 directory subtree out in a way that suits it.  File lookup keys are
 turned into filenames.  This may result in a less efficient arrangement
 in the cache than the original data, especially as directories may become
 very large, so Ext3 may be doing some extra work.

In case (3), this perhaps suggests that cachefiles's directory layout may be
part of the problem.  Running the following:

ls -ldSr `find . -type d`

in /var/fscache/cache shows that the directories are either 4096 bytes in size
(158 instances) or 12288 bytes in size (105 instances), for a total of 263
directories.  There are 19255 files.

Running that ls command in /warthog/aaa shows 1185 directories, all but three
of them 4096 bytes in size; two are 12288 bytes and one is 20480 bytes in size
(include/linux/ unsurprisingly).  There are 19258 files, three of which are
hardlinks to other files in the tree.

 This could be easily tested by running a test against a server that is the
 same as the client, and does not have the files in memory.  If local access
 is still slower than network then there is a real issue with cache
 efficiency.

My server is also my desktop machine.  The only way to guarantee that the
memory is scrubbed is to reboot it:-(  I'll look at setting up one of my other
machines as an NFS server.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

  The way the client works is like this:
 
 Thanks for the excellent ascii art, that cleared up the confusion right
 away.

You know what they say about pictures... :-)

  What are you trying to do exactly?  Are you actually playing with it, or
  just looking at the numbers I've produced?
 
 Trying to see if you are offering enough of a win to justify testing it,
 and if that works out, then going shopping for a bin of rotten vegetables
 to throw at your design, which I hope you will perceive as useful.

One thing that you have to remember: my test setup is pretty much the
worst-case for being appropriate for showing the need for caching to improve
performance.  There's a single client and a single server, they've got GigE
networking between them that has very little other load, and the server has
sufficient memory to hold the entire test data set.

 From the numbers you have posted I think you are missing some basic
 efficiencies that could take this design from the sorta-ok zone to wow!

Not really, it's just that this lashup could be considered designed to show
local caching in the worst light.

 But looking up the object in the cache should be nearly free - much less
 than a microsecond per block.

The problem is that you have to do a database lookup of some sort, possibly
involving several synchronous disk operations.

CacheFiles does a disk lookup by taking the key given to it by NFS, turning it
into a set of file or directory names, and doing a short pathwalk to the target
cache file.  Throwing in extra indices won't necessarily help.  What matters is
how quick the backing filesystem is at doing lookups.  As it turns out, Ext3 is
a fair bit better then BTRFS when the disk cache is cold.

  The metadata problem is quite a tricky one since it increases with the
  number of files you're dealing with.  As things stand in my patches, when
  NFS, for example, wants to access a new inode, it first has to go to the
  server to lookup the NFS file handle, and only then can it go to the cache
  to find out if there's a matching object in the case.
 
 So without the persistent cache it can omit the LOOKUP and just send the
 filehandle as part of the READ?

What 'it'?  Note that the get the filehandle, you have to do a LOOKUP op.  With
the cache, we could actually cache the results of lookups that we've done,
however, we don't know that the results are still valid without going to the
server:-/

AFS has a way around that - it versions its vnode (inode) IDs.

  The reason my client going to my server is so quick is that the server has
  the dcache and the pagecache preloaded, so that across-network lookup
  operations are really, really quick, as compared to the synchronous
  slogging of the local disk to find the cache object.
 
 Doesn't that just mean you have to preload the lookup table for the
 persistent cache so you can determine whether you are caching the data
 for a filehandle without going to disk?

Where lookup table == dcache.  That would be good yes.  cachefilesd
prescans all the files in the cache, which ought to do just that, but it
doesn't seem to be very effective.  I'm not sure why.

  I can probably improve this a little by pre-loading the subindex
  directories (hash tables) that I use to reduce the directory size in the
  cache, but I don't know by how much.
 
 Ah I should have read ahead.  I think the correct answer is a lot.

Quite possibly.  It'll allow me to dispense with at least one fs lookup call
per cache object request call.

 Your big can-t-get-there-from-here is the round trip to the server to
 determine whether you should read from the local cache.  Got any ideas?

I'm not sure what you mean.  Your statement should probably read ... to
determine _what_ you should read from the local cache.

 And where is the Trond-meister in all of this?

Keeping quiet as far as I can tell.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread David Howells

Chris Mason [EMAIL PROTECTED] wrote:

  The interesting case is where the disk cache is warm, but the pagecache is
  cold (ie: just after a reboot after filling the caches).  Here, for the two
  big files case, BTRFS appears quite a bit better than Ext3, showing a 21%
  reduction in time for the smaller case and a 13% reduction for the larger
  case.
 
 I'm afraid I don't have a good handle on the filesystem operations that
 result from this workload.  Are we reading from the FS to fill the NFS page
 cache?

I'm not sure what you're asking.

When the cache is cold, we determine that we can't read from the cache very
quickly.  We then read data from the server and, in the background, create the
metadata in the cache and store the data to it (by copying netfs pages to
backingfs pages).

When the cache is warm, we read the data from the cache, copying the data from
the backingfs pages to the netfs pages.  We use bmap() to ascertain that there
is data to be read, otherwise we detect a hole and fallback to reading from
the server.

Looking up cache object involves a sequence of lookup() ops and getxattr() ops
on the backingfs.  Should an object not exist, we defer creation of that
object to a background thread and do lookups(), mkdirs() and setxattrs() and a
create() to manufacture the object.

We read data from an object by calling readpages() on the backingfs to bring
the data into the pagecache.  We monitor the PG_lock bits to find out when
each page is read or has completed with an error.

Writing pages to the cache is done completely in the background.
PG_fscache_write is set on a page when it is handed to fscache to storage,
then at some point a background thread wakes up and calls write_one_page() in
the backingfs to write that page to the cache file.  At the moment, this
copies the data into a backingfs page which is then marked PG_dirty, and the
VM writes it out in the usual way.

  More surprising is that BTRFS performed significantly worse (15% increase
  in time) in the case where the cache on disk was fully populated and then
  the machine had been rebooted to clear the pagecaches.
 
 Which FS operations are included here?  Finding all the files or just an 
 unmount?  Btrfs defrags metadata in the background, and unmount has to wait 
 for that defrag to finish.

BTRFS might not be doing any writing at all here - apart from local atimes
(used by cache culling), that is.

What it does have to do is lots of lookups, reads and getxattrs, all of which
are synchronous.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

   Have you got before/after benchmark results?
  
  See attached.
 
 Attached here are results using BTRFS (patched so that it'll work at all)
 rather than Ext3 on the client on the partition backing the cache.

And here are XFS results.

Tuning XFS makes a *really* big difference for the lots of small/medium files
being tarred case.  However, in general BTRFS is much better.

David
---


=
FEW BIG FILES TEST ON XFS
=

Completely cold caches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.286s
user0m0.000s
sys 0m1.828s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.228s
user0m0.000s
sys 0m1.360s

Warm NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.058s
user0m0.000s
sys 0m0.060s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.122s
user0m0.000s
sys 0m0.120s

Warm XFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.181s
user0m0.000s
sys 0m0.180s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m1.034s
user0m0.000s
sys 0m0.404s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m1.540s
user0m0.000s
sys 0m0.256s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m3.003s
user0m0.000s
sys 0m0.532s


==
MANY SMALL/MEDIUM FILE READING TEST ON XFS
==

Completely cold caches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real4m56.827s
user0m0.180s
sys 0m6.668s

Warm NFS pagecache:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m15.084s
user0m0.212s
sys 0m5.008s

Warm XFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m13.547s
user0m0.220s
sys 0m5.652s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real4m36.316s
user0m0.148s
sys 0m4.440s


===
MANY SMALL/MEDIUM FILE READING TEST ON AN OPTIMISED XFS
===

mkfs.xfs -d agcount=4 -l size=128m,version=2 /dev/sda6


Completely cold caches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real3m44.033s
user0m0.248s
sys 0m6.632s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real3m8.582s
user0m0.108s
sys 0m3.420s
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread David Howells

Chris Mason [EMAIL PROTECTED] wrote:

 Thanks for trying this, of course I'll ask you to try again with the latest 
 v0.13 code, it has a number of optimizations especially for CPU usage.

Here you go.  The numbers are very similar.

David

=
FEW BIG FILES TEST ON BTRFS v0.13
=

Completely cold caches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.202s
user0m0.000s
sys 0m1.716s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.212s
user0m0.000s
sys 0m0.896s

Warm BTRFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.197s
user0m0.000s
sys 0m0.192s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.376s
user0m0.000s
sys 0m0.372s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m1.543s
user0m0.004s
sys 0m1.448s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m3.111s
user0m0.000s
sys 0m2.856s


==
MANY SMALL/MEDIUM FILE READING TEST ON BTRFS v0.13
==

Completely cold caches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m31.575s
user0m0.176s
sys 0m6.316s

Warm BTRFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m16.081s
user0m0.164s
sys 0m5.528s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real2m15.245s
user0m0.064s
sys 0m2.808s

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

 I am eventually going to suggest cutting the backing filesystem entirely out
 of the picture,

You still need a database to manage the cache.  A filesystem such as Ext3
makes a very handy database for four reasons:

 (1) It exists and works.

 (2) It has a well defined interface within the kernel.

 (3) I can place my cache on, say, my root partition on my laptop.  I don't
 have to dedicate a partition to the cache.

 (4) Userspace cache management tools (such as cachefilesd) have an already
 existing interface to use: rmdir, unlink, open, getdents, etc..

I do have a cache-on-blockdev thing, but it's basically a wandering tree
filesystem inside.  It is, or was, much faster than ext3 on a clean cache, but
it degrades horribly over time because my free space reclamation sucks - it
gradually randomises the block allocation sequence over time.

So, what would you suggest instead of a backing filesystem?

 I really do not like idea of force fitting this cache into a generic
 vfs model.  Sun was collectively smoking some serious crack when they
 cooked that one up.  But there is also the ageless principle isness is
 more important than niceness.

What do you mean?  I'm not doing it like Sun.  The cache is a side path from
the netfs.  It should be transparent to the user, the VFS and the server.

The only place it might not be transparent is that you might to have to
instruct the netfs mount to use the cache.  I'd prefer to do it some other way
than passing parameters to mount, though, as (1) this causes fun with NIS
distributed automounter maps, and (2) people are asking for a finer grain of
control than per-mountpoint.  Unfortunately, I can't seem to find a way to do
it that's acceptable to Al.

 Which would require a change to NFS, not an option because you hope to
 work with standard servers?  Of course with years to think about this,
 the required protocol changes were put into v4.  Not.

I don't think there's much I can do about NFS.  It requires the filesystem
from which the NFS server is dealing to have inode uniquifiers, which are then
incorporated into the file handle.  I don't think the NFS protocol itself
needs to change to support this.

 Have you completely exhausted optimization ideas for the file handle
 lookup?

No, but there aren't many.  CacheFiles doesn't actually do very much, and it's
hard to reduce that not very much.  The most obvious thing is to prepopulate
the dcache, but that's at the expense of memory usage.

Actually, if I cache the name = FH mapping I used last time, I can make a
start on looking up in the cache whilst simultaneously accessing the server.
If what's on the server has changed, I can ditch the speculative cache lookup
I was making and start a new cache lookup.

However, storing directory entries has penalties of its own, though it'll be
necesary if we want to do disconnected operation.

  Where lookup table == dcache.  That would be good yes.  cachefilesd
  prescans all the files in the cache, which ought to do just that, but it
  doesn't seem to be very effective.  I'm not sure why.
 
 RCU?  Anyway, it is something to be tracked down and put right.

cachefilesd runs in userspace.  It's possible it isn't doing enough to preload
all the metadata.

 What I tried to say.  So still... got any ideas?  That extra synchronous
 network round trip is a killer.  Can it be made streaming/async to keep
 throughput healthy?

That's a per-netfs thing.  With the test rig I've got, it's going to the
on-disk cache that's the killer.  Going over the network is much faster.

See the results I posted.  For the tarball load, and using Ext3 to back the
cache:

Cold NFS cache, no disk cache:  0m22.734s
Warm on-disk cache, cold pagecaches:1m54.350s

The problem is reading using tar is a worst case workload for this.  Everything
it does is pretty much completely synchronous.

One thing that might help is if things like tar and find can be made to use
fadvise() on directories to hint to the filesystem (NFS, AFS, whatever) that
it's going to access every file in those directories.

Certainly AFS could make use of that: the directory is read as a file, and the
netfs then parses the file to get a list of vnode IDs that that directory
points to.  It could then do bulk status fetch operations to instantiate the
inodes 50 at a time.

I don't know whether NFS could use it.  Someone like Trond or SteveD or Chuck
would have to answer that.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-21 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

  These patches add local caching for network filesystems such as NFS.
 
 Have you got before/after benchmark results?

I need to get a new hard drive for my test machine before I can go and get
some more up to date benchmark results.  It does seem, however, that the I/O
error handling capabilities of FS-Cache work properly:-)

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-21 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

 Have you got before/after benchmark results?

See attached.

These show a couple of things:

 (1) Dealing with lots of metadata slows things down a lot.  Note the result of
 looking and reading lots of small files with tar (the last result).  The
 NFS client has to both consult the NFS server *and* the cache.  Not only
 that, but any asynchronicity the cache may like to do is rendered
 ineffective by the fact tar wants to do a read on a file pretty much
 directly after opening it.

 (2) Getting metadata from the local disk fs is slower than pulling it across
 an unshared gigabit ethernet from a server that already has it in memory.

These points don't mean that fscache is no use, just that you have to consider
carefully whether it's of use to *you* given your particular situation, and
that depends on various factors.

Note that currently FS-Caching is disabled for individual NFS files opened for
writing as there's no way to handle the coherency problems thereby introduced.

David
---

  ===
  FS-CACHE FOR NFS BENCHMARKS
  ===

 (*) The NFS client has a 1.86GHz Core2 Duo CPU and 1GB of RAM.

 (*) The NFS client has a Seagate ST380211AS 80GB 7200rpm SATA disk on an
 interface running in AHCI mode.  The chipset is an Intel G965.

 (*) A partition of approx 4.5GB is committed to caching, and is formatted as
 Ext3 with a blocksize of 4096 and directory indices.

 (*) The NFS client is using SELinux.

 (*) The NFS server is running an in-kernel NFSd, and has a 2.66GHz Core2 Duo
 CPU and 6GB of RAM.  The chipset is an Intel P965.

 (*) The NFS client is connected to the NFS server by Gigabit Ethernet.

 (*) The NFS mount is made with defaults for all options not relating to the
 cache:

warthog:/warthog /warthog nfs
rw,vers=3,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,
retrans=2,sec=sys,fsc,addr=w.x.y.z 0 0


==
FEW BIG FILES TEST
==

Where:

 (*) The NFS server has two files:

[EMAIL PROTECTED] ~]# ls -l /warthog/bigfile
-rw-rw-r-- 1 4043 4043 104857600 2006-11-30 09:39 /warthog/bigfile
[EMAIL PROTECTED] ~]# ls -l /warthog/biggerfile 
-rw-rw-r-- 1 4043 4041 209715200 2006-03-21 13:56 /warthog/biggerfile

 Both of which are in memory on the server in all cases.


No patches, cold NFS cache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m1.909s
user0m0.000s
sys 0m0.520s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m3.750s
user0m0.000s
sys 0m0.904s

CONFIG_FSCACHE=n, cold NFS cache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.003s
user0m0.000s
sys 0m0.124s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.100s
user0m0.004s
sys 0m0.488s

Cold NFS cache, no disk cache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.084s
user0m0.000s
sys 0m0.136s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.020s
user0m0.000s
sys 0m0.720s

Completely cold caches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.412s
user0m0.000s
sys 0m0.892s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.449s
user0m0.000s
sys 0m2.300s

Warm NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.067s
user0m0.000s
sys 0m0.064s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.133s
user0m0.000s
sys 0m0.136s

Warm Ext3 pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.173s
user0m0.000s
sys 0m0.172s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.316s
user0m0.000s
sys 0m0.316s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m1.955s
user0m0.000s
sys 0m0.244s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m3.596s
user0m0.000s
sys 0m0.460s


===
MANY SMALL/MEDIUM FILE READING TEST
===

Where:

 (*) The NFS server has an old kernel tree:

[EMAIL PROTECTED] ~]# du -s /warthog/aaa
347340  /warthog/aaa
[EMAIL

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-21 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

  Have you got before/after benchmark results?
 
 See attached.

Attached here are results using BTRFS (patched so that it'll work at all)
rather than Ext3 on the client on the partition backing the cache.

Note that I didn't bother redoing the tests that didn't involve a cache as the
choice of filesystem backing the cache should have no bearing on the result.

Generally, completely cold caches shouldn't show much variation as all the
writing can be done completely asynchronously, provided the client doesn't
fill its RAM.

The interesting case is where the disk cache is warm, but the pagecache is
cold (ie: just after a reboot after filling the caches).  Here, for the two
big files case, BTRFS appears quite a bit better than Ext3, showing a 21%
reduction in time for the smaller case and a 13% reduction for the larger
case.

For the many small/medium files case, BTRFS performed significantly better
(15% reduction in time) in the case where the caches were completely cold.
I'm not sure why, though - perhaps because it doesn't execute a write_begin()
stage during the write_one_page() call and thus doesn't go allocating disk
blocks to back the data, but instead allocates them later.

More surprising is that BTRFS performed significantly worse (15% increase in
time) in the case where the cache on disk was fully populated and then the
machine had been rebooted to clear the pagecaches.

It's important to note that I've only run each test once apiece, so the
numbers should be taken with a modicum of salt (bad statistics and all that).

David
---
===
FEW BIG FILES TEST ON BTRFS
===

Completely cold caches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m2.124s
user0m0.000s
sys 0m1.260s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m4.538s
user0m0.000s
sys 0m2.624s

Warm NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.061s
user0m0.000s
sys 0m0.064s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.118s
user0m0.000s
sys 0m0.116s

Warm BTRFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m0.189s
user0m0.000s
sys 0m0.188s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m0.369s
user0m0.000s
sys 0m0.368s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null
real0m1.540s
user0m0.000s
sys 0m1.440s
[EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null
real0m3.132s
user0m0.000s
sys 0m1.724s



MANY SMALL/MEDIUM FILE READING TEST ON BTRFS


Completely cold caches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m31.838s
user0m0.192s
sys 0m6.076s

Warm NFS pagecache:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m14.841s
user0m0.148s
sys 0m4.988s

Warm BTRFS pagecache, cold NFS pagecache:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real0m16.773s
user0m0.148s
sys 0m5.512s

Warm on-disk cache, cold pagecaches:

[EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero
real2m12.527s
user0m0.080s
sys 0m2.908s

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-21 Thread David Howells

Daniel Phillips [EMAIL PROTECTED] wrote:

 When you say Ext3 cache vs NFS cache is the first on the server and the 
 second on the client?

The filesystem on the server is pretty much irrelevant as long as (a) it
doesn't change, and (b) all the data is in memory on the server anyway.

The way the client works is like this:

+-+
| |   
|   NFS   |--+
| |  |
+-+  |   +--+ 
 |   |  | 
+-+  +--|  | 
| |  |  |
|   AFS   |-| FS-Cache |
| |  |  |--+
+-+  +--|  |  |
 |   |  |  |   +--+   +--+
+-+  |   +--+  |   |  |   |  |
| |  | +--|  CacheFiles  |--|  Ext3|
|  ISOFS  |--+ |  /var/cache  |   |  /dev/sda6   |
| |+--+   +--+
+-+


 (1) NFS, say, asks FS-Cache to store/retrieve data for it;

 (2) FS-Cache asks the cache backend, in this case CacheFiles to honour the
 operation;

 (3) CacheFiles 'opens' a file in a mounted filesystem, say Ext3, and does read
 and write operations of a sort on it;

 (4) Ext3 decides how the cache data is laid out on disk - CacheFiles just
 attempts to use one sparse file per netfs inode.

 I am trying to spot the numbers that show the sweet spot for this 
 optimization, without much success so far.

What are you trying to do exactly?  Are you actually playing with it, or just
looking at the numbers I've produced?

 Who is supposed to win big?  Is this mainly about reducing the load on 
 the server, or is the client supposed to win even with a lightly loaded 
 server?

These are difficult questions to answer.  The obvious answer to both is it
depends, and the real answer to both is it's a compromise.

Inserting a cache adds overhead: you have to look in the cache to see if your
objects are mirrored there, and then you have to look in the cache to see if
the data you want is stored there; and then you might have to go to the server
anyway and then schedule a copy to be stored in the cache.

The characteristics of this type of cache depend on a number of things: the
filesystem backing it being the most obvious variable, but also how fragmented
it is and the properties of the disk drive or drives it is on.

Whether it's worth having a cache depend on the characteristics of the network
versus the characteristics of the cache.  Latency of the cache vs latency of
the network, for example.  Network loading is another: having a cache on each
of several clients sharing a server can reduce network traffic by avoiding the
read requests to the server.  NFS has a characteristic that it keeps spamming
the server with file status requests, so even if you take the read requests out
of the load, an NFS client still generates quite a lot of network traffic to
the server - but the reduction is still useful.

The metadata problem is quite a tricky one since it increases with the number
of files you're dealing with.  As things stand in my patches, when NFS, for
example, wants to access a new inode, it first has to go to the server to
lookup the NFS file handle, and only then can it go to the cache to find out if
there's a matching object in the case.  Worse, the cache must then perform
several synchronous disk bound metadata operations before it can be possible to
read from the cache.  Worse still, this means that a read on the network file
cannot proceed until (a) we've been to the server *plus* (b) we've been to the
disk.

The reason my client going to my server is so quick is that the server has the
dcache and the pagecache preloaded, so that across-network lookup operations
are really, really quick, as compared to the synchronous slogging of the local
disk to find the cache object.

I can probably improve this a little by pre-loading the subindex directories
(hash tables) that I use to reduce the directory size in the cache, but I don't
know by how much.


Anyway, to answer your questions:

 (1) It may help with heavily loaded networks with lots of read-only traffic.

 (2) It may help with slow connections (like doing NFS between the UK and
 Australia).

 (3) It could be used to do offline/disconnected operation.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/37] Permit filesystem local caching

2008-02-20 Thread David Howells



These patches add local caching for network filesystems such as NFS.

The patches can roughly be broken down into a number of sets:

  (*) 01-keys-inc-payload.diff
  (*) 02-keys-search-keyring.diff
  (*) 03-keys-callout-blob.diff

  Three patches to the keyring code made to help the CIFS people.
  Included because of patches 05-08.

  (*) 04-keys-get-label.diff

  A patch to allow the security label of a key to be retrieved.
  Included because of patches 05-08.

  (*) 05-security-current-fsugid.diff
  (*) 06-security-separate-task-bits.diff
  (*) 07-security-subjective.diff
  (*) 08-security-kernel_service-class.diff
  (*) 09-security-kernel-service.diff
  (*) 10-security-nfsd.diff

  Patches to permit the subjective security of a task to be overridden.
  All the security details in task_struct are decanted into a new struct
  that task_struct then has two pointers two: one that defines the
  objective security of that task (how other tasks may affect it) and one
  that defines the subjective security (how it may affect other objects).

  Note that I have dropped the idea of struct cred for the moment.  With
  the amount of stuff that was excluded from it, it wasn't actually any
  use to me.  However, it can be added later.

  Required for cachefiles.

  (*) 11-release-page.diff
  (*) 12-fscache-page-flags.diff
  (*) 13-add_wait_queue_tail.diff
  (*) 14-fscache.diff

  Patches to provide a local caching facility for network filesystems.

  (*) 15-cachefiles-ia64.diff
  (*) 16-cachefiles-ext3-f_mapping.diff
  (*) 17-cachefiles-write.diff
  (*) 18-cachefiles-monitor.diff
  (*) 19-cachefiles-export.diff
  (*) 20-cachefiles.diff

  Patches to provide a local cache in a directory of an already mounted
  filesystem.

  (*) 21-nfs-comment.diff
  (*) 22-nfs-fscache-option.diff
  (*) 23-nfs-fscache-kconfig.diff
  (*) 24-nfs-fscache-top-index.diff
  (*) 25-nfs-fscache-server-obj.diff
  (*) 26-nfs-fscache-super-obj.diff
  (*) 27-nfs-fscache-inode-obj.diff
  (*) 28-nfs-fscache-use-inode.diff
  (*) 29-nfs-fscache-invalidate-pages.diff
  (*) 30-nfs-fscache-iostats.diff
  (*) 31-nfs-fscache-page-management.diff
  (*) 32-nfs-fscache-read-context.diff
  (*) 33-nfs-fscache-read-fallback.diff
  (*) 34-nfs-fscache-read-from-cache.diff
  (*) 35-nfs-fscache-store-to-cache.diff
  (*) 36-nfs-fscache-mount.diff
  (*) 37-nfs-fscache-display.diff

  Patches to provide NFS with local caching.

  A couple of questions on the NFS iostat changes: (1) Should I update the
  iostat version number; (2) is it permitted to have conditional iostats?


I've brought the patchset up to date with respect to the 2.6.25-rc1 merge
window, in particular altering Smack to handle the split in objective and
subjective security in the task_struct.

--
A tarball of the patches is available at:


http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-30.tar.bz2


To use this version of CacheFiles, the cachefilesd-0.9 is also required.  It
is available as an SRPM:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm

Or as individual bits:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec

The .fc, .if and .te files are for manipulating SELinux.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/37] Security: Make NFSD work with detached security

2008-02-20 Thread David Howells

Make NFSD work with detached security, using the patches that excise the
security information from task_struct to struct task_security as a base.

Each time NFSD wants a new security descriptor (to do NFS4 recovery or just to
do NFS operations), a task_security record is derived from NFSD's *objective*
security, modified and then applied as the *subjective* security.  This means
(a) the changes are not visible to anyone looking at NFSD through /proc, (b)
there is no leakage between two consecutive ops with different security
configurations.

Consideration should probably be given to caching the task_security record on
the basis that there'll probably be several ops that will want to use any
particular security configuration.

Furthermore, nfs4recover.c perhaps ought to set an appropriate LSM context on
the record pointed to by rec_security so that the disk is accessed
appropriately (see set_security_override[_from_ctx]()).

NOTE!  This patch must be rolled in to one of the earlier security patches to
make it compile fully.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfsd/auth.c|   37 +++-
 fs/nfsd/nfs4recover.c |   64 +++--
 2 files changed, 65 insertions(+), 36 deletions(-)


diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index 5586157..ebdc562 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -6,6 +6,7 @@
 
 #include linux/types.h
 #include linux/sched.h
+#include linux/cred.h
 #include linux/sunrpc/svc.h
 #include linux/sunrpc/svcauth.h
 #include linux/nfsd/nfsd.h
@@ -26,12 +27,17 @@ int nfsexp_flags(struct svc_rqst *rqstp, struct svc_export 
*exp)
 
 int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
 {
-   struct task_security *act_as = current-act_as;
+   struct task_security *sec, *old;
struct svc_cred cred = rqstp-rq_cred;
int i;
int flags = nfsexp_flags(rqstp, exp);
int ret;
 
+   /* derive the new security record from nfsd's objective security */
+   sec = get_kernel_security(current);
+   if (!sec)
+   return -ENOMEM;
+
if (flags  NFSEXP_ALLSQUASH) {
cred.cr_uid = exp-ex_anon_uid;
cred.cr_gid = exp-ex_anon_gid;
@@ -55,26 +61,33 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export 
*exp)
get_group_info(cred.cr_group_info);
 
if (cred.cr_uid != (uid_t) -1)
-   act_as-fsuid = cred.cr_uid;
+   sec-fsuid = cred.cr_uid;
else
-   act_as-fsuid = exp-ex_anon_uid;
+   sec-fsuid = exp-ex_anon_uid;
if (cred.cr_gid != (gid_t) -1)
-   act_as-fsgid = cred.cr_gid;
+   sec-fsgid = cred.cr_gid;
else
-   act_as-fsgid = exp-ex_anon_gid;
+   sec-fsgid = exp-ex_anon_gid;
 
-   if (!cred.cr_group_info)
+   if (!cred.cr_group_info) {
+   put_task_security(sec);
return -ENOMEM;
-   ret = set_groups(act_as, cred.cr_group_info);
+   }
+   ret = set_groups(sec, cred.cr_group_info);
put_group_info(cred.cr_group_info);
if ((cred.cr_uid)) {
-   act_as-cap_effective =
-   cap_drop_nfsd_set(act_as-cap_effective);
+   sec-cap_effective =
+   cap_drop_nfsd_set(sec-cap_effective);
} else {
-   act_as-cap_effective =
-   cap_raise_nfsd_set(act_as-cap_effective,
-  act_as-cap_permitted);
+   sec-cap_effective =
+   cap_raise_nfsd_set(sec-cap_effective,
+  sec-cap_permitted);
}
+
+   /* set the new security as nfsd's subjective security */
+   old = current-act_as;
+   current-act_as = sec;
+   put_task_security(old);
return ret;
 }
 
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index afddc9b..c86aa92 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -46,27 +46,37 @@
 #include linux/scatterlist.h
 #include linux/crypto.h
 #include linux/sched.h
+#include linux/cred.h
 
 #define NFSDDBG_FACILITYNFSDDBG_PROC
 
 /* Globals */
 static struct nameidata rec_dir;
 static int rec_dir_init = 0;
+static struct task_security *rec_security;
 
+/*
+ * switch the special recovery access security in on the current task's
+ * subjective security
+ */
 static void
-nfs4_save_user(uid_t *saveuid, gid_t *savegid)
+nfs4_begin_secure(struct task_security **saved_sec)
 {
-   *saveuid = current-act_as-fsuid;
-   *savegid = current-act_as-fsgid;
-   current-act_as-fsuid = 0;
-   current-act_as-fsgid = 0;
+   *saved_sec = current-act_as;
+   current-act_as = get_task_security(rec_security);
 }
 
+/*
+ * return the current task's subjective security to its former glory
+ */
 static void
-nfs4_reset_user(uid_t saveuid, gid_t

[PATCH 05/37] Security: Change current-fs[ug]id to current_fs[ug]id()

2008-02-20 Thread David Howells

Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be
separated from the task_struct.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/perfmon.c|4 ++--
 arch/powerpc/platforms/cell/spufs/inode.c |4 ++--
 drivers/isdn/capi/capifs.c|4 ++--
 drivers/usb/core/inode.c  |4 ++--
 fs/9p/fid.c   |2 +-
 fs/9p/vfs_inode.c |4 ++--
 fs/9p/vfs_super.c |4 ++--
 fs/affs/inode.c   |4 ++--
 fs/anon_inodes.c  |4 ++--
 fs/attr.c |4 ++--
 fs/bfs/dir.c  |4 ++--
 fs/cifs/cifsproto.h   |2 +-
 fs/cifs/dir.c |   12 ++--
 fs/cifs/inode.c   |8 
 fs/cifs/misc.c|4 ++--
 fs/coda/cache.c   |6 +++---
 fs/coda/upcall.c  |4 ++--
 fs/devpts/inode.c |4 ++--
 fs/dquot.c|2 +-
 fs/exec.c |4 ++--
 fs/ext2/balloc.c  |2 +-
 fs/ext2/ialloc.c  |4 ++--
 fs/ext2/ioctl.c   |2 +-
 fs/ext3/balloc.c  |2 +-
 fs/ext3/ialloc.c  |4 ++--
 fs/ext4/balloc.c  |2 +-
 fs/ext4/ialloc.c  |4 ++--
 fs/fuse/dev.c |4 ++--
 fs/gfs2/inode.c   |   10 +-
 fs/hfs/inode.c|4 ++--
 fs/hfsplus/inode.c|4 ++--
 fs/hpfs/namei.c   |   24 
 fs/hugetlbfs/inode.c  |   16 
 fs/jffs2/fs.c |4 ++--
 fs/jfs/jfs_inode.c|4 ++--
 fs/locks.c|2 +-
 fs/minix/bitmap.c |4 ++--
 fs/namei.c|8 
 fs/nfsd/vfs.c |6 +++---
 fs/ocfs2/dlm/dlmfs.c  |8 
 fs/ocfs2/namei.c  |4 ++--
 fs/pipe.c |4 ++--
 fs/posix_acl.c|4 ++--
 fs/ramfs/inode.c  |4 ++--
 fs/reiserfs/namei.c   |4 ++--
 fs/sysv/ialloc.c  |4 ++--
 fs/udf/ialloc.c   |4 ++--
 fs/udf/namei.c|2 +-
 fs/ufs/ialloc.c   |4 ++--
 fs/xfs/linux-2.6/xfs_linux.h  |4 ++--
 fs/xfs/xfs_acl.c  |6 +++---
 fs/xfs/xfs_attr.c |2 +-
 fs/xfs/xfs_inode.c|4 ++--
 fs/xfs/xfs_vnodeops.c |8 
 include/linux/fs.h|2 +-
 include/linux/sched.h |3 +++
 ipc/mqueue.c  |4 ++--
 kernel/cgroup.c   |4 ++--
 mm/shmem.c|8 
 net/9p/client.c   |2 +-
 net/socket.c  |4 ++--
 net/sunrpc/auth.c |8 
 security/commoncap.c  |4 ++--
 security/keys/key.c   |2 +-
 security/keys/keyctl.c|2 +-
 security/keys/request_key.c   |   10 +-
 security/keys/request_key_auth.c  |2 +-
 67 files changed, 161 insertions(+), 158 deletions(-)


diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index f6b9971..4b229f2 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2191,8 +2191,8 @@ pfm_alloc_fd(struct file **cfile)
DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode));
 
inode-i_mode = S_IFCHR|S_IRUGO;
-   inode-i_uid  = current-fsuid;
-   inode-i_gid  = current-fsgid;
+   inode-i_uid  = current_fsuid();
+   inode-i_gid  = current_fsgid();
 
sprintf(name, [%lu], inode-i_ino);
this.name = name;
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 6d1228c..a789ecf 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -86,8 +86,8 @@ spufs_new_inode(struct super_block *sb, int mode)
goto out;
 
inode-i_mode = mode;
-   inode-i_uid = current-fsuid;
-   inode-i_gid = current-fsgid;
+   inode-i_uid = current_fsuid();
+   inode-i_gid

[PATCH 24/37] NFS: Register NFS for caching and retrieve the top-level index

2008-02-20 Thread David Howells

Register NFS for caching and retrieve the top-level cache index object cookie.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/Makefile|1 +
 fs/nfs/fscache-index.c |   53 
 fs/nfs/fscache.h   |   35 
 fs/nfs/inode.c |8 +++
 4 files changed, 97 insertions(+), 0 deletions(-)
 create mode 100644 fs/nfs/fscache-index.c
 create mode 100644 fs/nfs/fscache.h


diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index df0f41e..6d7176d 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache-index.o
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
new file mode 100644
index 000..225ed5d
--- /dev/null
+++ b/fs/nfs/fscache-index.c
@@ -0,0 +1,53 @@
+/* NFS FS-Cache index structure definition
+ *
+ * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include linux/init.h
+#include linux/kernel.h
+#include linux/sched.h
+#include linux/mm.h
+#include linux/nfs_fs.h
+#include linux/nfs_fs_sb.h
+#include linux/in6.h
+
+#include internal.h
+#include fscache.h
+
+#define NFSDBG_FACILITYNFSDBG_FSCACHE
+
+static const struct fscache_netfs_operations nfs_cache_ops = {
+};
+
+/*
+ * Define the NFS filesystem for FS-Cache.  Upon registration FS-Cache sticks
+ * the cookie for the top-level index object for NFS into this structure.  The
+ * top-level index can than have other cache objects inserted into it.
+ */
+struct fscache_netfs nfs_cache_netfs = {
+   .name   = nfs,
+   .version= 0,
+   .ops= nfs_cache_ops,
+};
+
+/*
+ * Register NFS for caching
+ */
+int nfs_fscache_register(void)
+{
+   return fscache_register_netfs(nfs_cache_netfs);
+}
+
+/*
+ * Unregister NFS for caching
+ */
+void nfs_fscache_unregister(void)
+{
+   fscache_unregister_netfs(nfs_cache_netfs);
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
new file mode 100644
index 000..75e5a03
--- /dev/null
+++ b/fs/nfs/fscache.h
@@ -0,0 +1,35 @@
+/* NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include linux/nfs_fs.h
+#include linux/nfs_mount.h
+#include linux/nfs4_mount.h
+
+#ifdef CONFIG_NFS_FSCACHE
+#include linux/fscache.h
+
+/*
+ * fscache-index.c
+ */
+extern struct fscache_netfs nfs_cache_netfs;
+
+extern int nfs_fscache_register(void);
+extern void nfs_fscache_unregister(void);
+
+#else /* CONFIG_NFS_FSCACHE */
+static inline int nfs_fscache_register(void) { return 0; }
+static inline void nfs_fscache_unregister(void) {}
+
+#endif /* CONFIG_NFS_FSCACHE */
+#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 966a885..7254d5c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -46,6 +46,7 @@
 #include delegation.h
 #include iostat.h
 #include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_VFS
 
@@ -1222,6 +1223,10 @@ static int __init init_nfs_fs(void)
 {
int err;
 
+   err = nfs_fscache_register();
+   if (err  0)
+   goto out6;
+
err = nfs_fs_proc_init();
if (err)
goto out5;
@@ -1268,6 +1273,8 @@ out3:
 out4:
nfs_fs_proc_exit();
 out5:
+   nfs_fscache_unregister();
+out6:
return err;
 }
 
@@ -1278,6 +1285,7 @@ static void __exit exit_nfs_fs(void)
nfs_destroy_readpagecache();
nfs_destroy_inodecache();
nfs_destroy_nfspagecache();
+   nfs_fscache_unregister();
 #ifdef CONFIG_PROC_FS
rpc_proc_unregister(nfs);
 #endif

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/37] KEYS: Check starting keyring as part of search

2008-02-20 Thread David Howells

Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.

The scenario:  User in process A does things that cause things to be
created in its process session keyring.  The user then does an su to
another user and starts a new process, B.  The two processes now
share the same process session keyring.

Process B does an NFS access which results in an upcall to gssd.
When gssd attempts to instantiate the context key (to be linked
into the process session keyring), it is denied access even though it
has an authorization key.

The order of calls is:

   keyctl_instantiate_key()
  lookup_user_key() (the default: case)
 search_process_keyrings(current)
search_process_keyrings(rka-context)   (recursive call)
   keyring_search_aux()

keyring_search_aux() verifies the keys and keyrings underneath the
top-level keyring it is given, but that top-level keyring is neither
fully validated nor checked to see if it is the thing being searched for.

This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for


Signed-off-by: Kevin Coffman [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyring.c |   35 +++
 1 files changed, 31 insertions(+), 4 deletions(-)


diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 88292e3..76b89b2 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 
struct keyring_list *keylist;
struct timespec now;
-   unsigned long possessed;
+   unsigned long possessed, kflags;
struct key *keyring, *key;
key_ref_t key_ref;
long err;
@@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
now = current_kernel_time();
err = -EAGAIN;
sp = 0;
+   
+   /* firstly we should check to see if this top-level keyring is what we
+* are looking for */
+   key_ref = ERR_PTR(-EAGAIN);
+   kflags = keyring-flags;
+   if (keyring-type == type  match(keyring, description)) {
+   key = keyring;
+
+   /* check it isn't negative and hasn't expired or been
+* revoked */
+   if (kflags  (1  KEY_FLAG_REVOKED))
+   goto error_2;
+   if (key-expiry  now.tv_sec = key-expiry)
+   goto error_2;
+   key_ref = ERR_PTR(-ENOKEY);
+   if (kflags  (1  KEY_FLAG_NEGATIVE))
+   goto error_2;
+   goto found;
+   }
+
+   /* otherwise, the top keyring must not be revoked, expired, or
+* negatively instantiated if we are to search it */
+   key_ref = ERR_PTR(-EAGAIN);
+   if (kflags  ((1  KEY_FLAG_REVOKED) | (1  KEY_FLAG_NEGATIVE)) ||
+   (keyring-expiry  now.tv_sec = keyring-expiry))
+   goto error_2;
 
/* start processing a new keyring */
 descend:
@@ -331,13 +357,14 @@ descend:
/* iterate through the keys in this keyring first */
for (kix = 0; kix  keylist-nkeys; kix++) {
key = keylist-keys[kix];
+   kflags = key-flags;
 
/* ignore keys not of this type */
if (key-type != type)
continue;
 
/* skip revoked keys and expired keys */
-   if (test_bit(KEY_FLAG_REVOKED, key-flags))
+   if (kflags  (1  KEY_FLAG_REVOKED))
continue;
 
if (key-expiry  now.tv_sec = key-expiry)
@@ -352,8 +379,8 @@ descend:
context, KEY_SEARCH)  0)
continue;
 
-   /* we set a different error code if we find a negative key */
-   if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) {
+   /* we set a different error code if we pass a negative key */
+   if (kflags  (1  KEY_FLAG_NEGATIVE)) {
err = -ENOKEY;
continue;
}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/37] NFS: Permit local filesystem caching to be enabled for NFS

2008-02-20 Thread David Howells

Permit local filesystem caching to be enabled for NFS in the kernel
configuration.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig |8 
 1 files changed, 8 insertions(+), 0 deletions(-)


diff --git a/fs/Kconfig b/fs/Kconfig
index c42ec50..fa8e978 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1644,6 +1644,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool Provide NFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m  FSCACHE || NFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool Allow direct I/O on NFS files
depends on NFS_FS

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 27/37] NFS: Define and create inode-level cache objects

2008-02-20 Thread David Howells

Define and create inode-level cache data storage objects (as managed by
nfs_inode structs).

Each inode-level object is created in a superblock-level index object and is
itself a data storage object into which pages from the inode are stored.

The inode object key is the NFS file handle for the inode.

The inode object is given coherency data to carry in the auxiliary data
permitted by the cache.  This is a sequence made up of:

 (1) i_mtime from the NFS inode.

 (2) i_ctime from the NFS inode.

 (3) i_size from the NFS inode.

As the cache is a persistent cache, the auxiliary data is checked when a new
NFS in-memory inode is set up that matches an already existing data storage
object in the cache.  If the coherency data is the same, the on-disk object is
retained and used; if not, it is scrapped and a new one created.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache-index.c |  112 
 fs/nfs/fscache.h   |1 
 2 files changed, 113 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index b5a52e3..c3c63fa 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -150,3 +150,115 @@ const struct fscache_cookie_def nfs_cache_super_index_def 
= {
.type   = FSCACHE_COOKIE_TYPE_INDEX,
.get_key= nfs_super_get_key,
 };
+
+/*
+ * Definition of the auxiliary data attached to NFS inode storage objects
+ * within the cache.
+ *
+ * The contents of this struct are recorded in the on-disk local cache in the
+ * auxiliary data attached to the data storage object backing an inode.  This
+ * permits coherency to be managed when a new inode binds to an already extant
+ * cache object.
+ */
+struct nfs_cache_inode_auxdata {
+   struct timespec mtime;
+   struct timespec ctime;
+   loff_t  size;
+};
+
+/*
+ * Generate a key to describe an NFS inode in an NFS server's index
+ */
+static uint16_t nfs_cache_inode_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t bufmax)
+{
+   const struct nfs_inode *nfsi = cookie_netfs_data;
+   uint16_t nsize;
+
+   /* use the inode's NFS filehandle as the key */
+   nsize = nfsi-fh.size;
+   memcpy(buffer, nfsi-fh.data, nsize);
+   return nsize;
+}
+
+/*
+ * Get certain file attributes from the netfs data
+ * - This function can be absent for an index
+ * - Not permitted to return an error
+ * - The netfs data from the cookie being used as the source is presented
+ */
+static void nfs_cache_inode_get_attr(const void *cookie_netfs_data, uint64_t 
*size)
+{
+   const struct nfs_inode *nfsi = cookie_netfs_data;
+
+   *size = nfsi-vfs_inode.i_size;
+}
+
+/*
+ * Get the auxiliary data from netfs data
+ * - This function can be absent if the index carries no state data
+ * - Should store the auxiliary data in the buffer
+ * - Should return the amount of amount stored
+ * - Not permitted to return an error
+ * - The netfs data from the cookie being used as the source is presented
+ */
+static uint16_t nfs_cache_inode_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t bufmax)
+{
+   struct nfs_cache_inode_auxdata auxdata;
+   const struct nfs_inode *nfsi = cookie_netfs_data;
+
+   auxdata.size = nfsi-vfs_inode.i_size;
+   auxdata.mtime = nfsi-vfs_inode.i_mtime;
+   auxdata.ctime = nfsi-vfs_inode.i_ctime;
+
+   if (bufmax  sizeof(auxdata))
+   bufmax = sizeof(auxdata);
+
+   memcpy(buffer, auxdata, bufmax);
+   return bufmax;
+}
+
+/*
+ * Consult the netfs about the state of an object
+ * - This function can be absent if the index carries no state data
+ * - The netfs data from the cookie being used as the target is
+ *   presented, as is the auxiliary data
+ */
+static enum fscache_checkaux nfs_cache_inode_check_aux(void *cookie_netfs_data,
+  const void *data,
+  uint16_t datalen)
+{
+   struct nfs_cache_inode_auxdata auxdata;
+   struct nfs_inode *nfsi = cookie_netfs_data;
+
+   if (datalen  sizeof(auxdata))
+   return FSCACHE_CHECKAUX_OBSOLETE;
+
+   auxdata.size = nfsi-vfs_inode.i_size;
+   auxdata.mtime = nfsi-vfs_inode.i_mtime;
+   auxdata.ctime = nfsi-vfs_inode.i_ctime;
+
+   if (memcmp(data, auxdata, datalen) != 0)
+   return FSCACHE_CHECKAUX_OBSOLETE;
+
+   return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * Define the inode object for FS-Cache.  This is used to describe an inode
+ * object to fscache_acquire_cookie().  It is keyed by the NFS file handle for
+ * an inode.
+ *
+ * Coherency is managed by comparing the copies of i_size, i_mtime and i_ctime
+ * held in the cache auxiliary data for the data storage object with those in
+ * the inode struct in memory.
+ */
+const struct

[PATCH 11/37] FS-Cache: Release page-private after failed readahead

2008-02-20 Thread David Howells

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 mm/readahead.c |   39 +--
 1 files changed, 37 insertions(+), 2 deletions(-)


diff --git a/mm/readahead.c b/mm/readahead.c
index c9c50ca..75aa6b6 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -44,6 +44,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 #define list_to_page(head) (list_entry((head)-prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+struct page *page)
+{
+   if (PagePrivate(page)) {
+   if (TestSetPageLocked(page))
+   BUG();
+   page-mapping = mapping;
+   do_invalidatepage(page, 0);
+   page-mapping = NULL;
+   unlock_page(page);
+   }
+   page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+   struct page *victim;
+
+   while (!list_empty(pages)) {
+   victim = list_to_page(pages);
+   list_del(victim-lru);
+   read_cache_pages_invalidate_page(mapping, victim);
+   }
+}
+
 /**
  * read_cache_pages - populate an address space with some pages  start reads 
against them
  * @mapping: the address_space
@@ -65,14 +100,14 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
list_del(page-lru);
if (add_to_page_cache_lru(page, mapping,
page-index, GFP_KERNEL)) {
-   page_cache_release(page);
+   read_cache_pages_invalidate_page(mapping, page);
continue;
}
page_cache_release(page);
 
ret = filler(data, page);
if (unlikely(ret)) {
-   put_pages_list(pages);
+   read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/37] KEYS: Increase the payload size when instantiating a key

2008-02-20 Thread David Howells

Increase the size of a payload that can be used to instantiate a key in
add_key() and keyctl_instantiate_key().  This permits huge CIFS SPNEGO blobs to
be passed around.  The limit is raised to 1MB.  If kmalloc() can't allocate a
buffer of sufficient size, vmalloc() will be tried instead.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyctl.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)


diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index d9ca15c..8ec8432 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -19,6 +19,7 @@
 #include linux/capability.h
 #include linux/string.h
 #include linux/err.h
+#include linux/vmalloc.h
 #include asm/uaccess.h
 #include internal.h
 
@@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type,
char type[32], *description;
void *payload;
long ret;
+   bool vm;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* draw all the data into kernel space */
@@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type,
/* pull the payload in if one was supplied */
payload = NULL;
 
+   vm = false;
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error2;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error2;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error2;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type,
 
key_ref_put(keyring_ref);
  error3:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
  error2:
kfree(description);
  error:
@@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id,
key_ref_t keyring_ref;
void *payload;
long ret;
+   bool vm = false;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* the appropriate instantiation authorisation key must have been
@@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id,
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id,
}
 
 error2:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
 error:
return ret;
 

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 28/37] NFS: Use local disk inode cache

2008-02-20 Thread David Howells

Bind data storage objects in the local cache to NFS inodes.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache.c   |  131 
 fs/nfs/fscache.h   |   19 +++
 fs/nfs/inode.c |   39 --
 include/linux/nfs_fs.h |   10 
 4 files changed, 193 insertions(+), 6 deletions(-)


diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index cbd09f0..c0e0320 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -166,3 +166,134 @@ void nfs_fscache_release_super_cookie(struct super_block 
*sb)
nfss-fscache_key = NULL;
}
 }
+
+/*
+ * Initialise the per-inode cache cookie pointer for an NFS inode.
+ */
+void nfs_fscache_init_inode_cookie(struct inode *inode)
+{
+   NFS_I(inode)-fscache = NULL;
+   if (S_ISREG(inode-i_mode))
+   set_bit(NFS_INO_FSCACHE, NFS_I(inode)-flags);
+}
+
+/*
+ * Get the per-inode cache cookie for an NFS inode.
+ */
+void nfs_fscache_enable_inode_cookie(struct inode *inode)
+{
+   struct super_block *sb = inode-i_sb;
+   struct nfs_inode *nfsi = NFS_I(inode);
+
+   if (nfsi-fscache || !NFS_FSCACHE(inode))
+   return;
+
+   if ((NFS_SB(sb)-options  NFS_OPTION_FSCACHE)) {
+   nfsi-fscache = fscache_acquire_cookie(
+   NFS_SB(sb)-fscache,
+   nfs_cache_inode_object_def,
+   nfsi);
+
+   dfprintk(FSCACHE, NFS: get FH cookie (0x%p/0x%p/0x%p)\n,
+sb, nfsi, nfsi-fscache);
+   }
+}
+
+/*
+ * Release a per-inode cookie.
+ */
+void nfs_fscache_release_inode_cookie(struct inode *inode)
+{
+   struct nfs_inode *nfsi = NFS_I(inode);
+
+   dfprintk(FSCACHE, NFS: clear cookie (0x%p/0x%p)\n,
+nfsi, nfsi-fscache);
+
+   fscache_relinquish_cookie(nfsi-fscache, 0);
+   nfsi-fscache = NULL;
+}
+
+/*
+ * Retire a per-inode cookie, destroying the data attached to it.
+ */
+void nfs_fscache_zap_inode_cookie(struct inode *inode)
+{
+   struct nfs_inode *nfsi = NFS_I(inode);
+
+   dfprintk(FSCACHE, NFS: zapping cookie (0x%p/0x%p)\n,
+nfsi, nfsi-fscache);
+
+   fscache_relinquish_cookie(nfsi-fscache, 1);
+   nfsi-fscache = NULL;
+}
+
+/*
+ * Turn off the cache with regard to a per-inode cookie if opened for writing,
+ * invalidating all the pages in the page cache relating to the associated
+ * inode to clear the per-page caching.
+ */
+void nfs_fscache_disable_inode_cookie(struct inode *inode)
+{
+   clear_bit(NFS_INO_FSCACHE, NFS_I(inode)-flags);
+
+   if (NFS_I(inode)-fscache) {
+   dfprintk(FSCACHE,
+NFS: nfsi 0x%p turning cache off\n, NFS_I(inode));
+
+   /* Need to invalidate any mapped pages that were read in before
+* turning off the cache.
+*/
+   if (inode-i_mapping  inode-i_mapping-nrpages)
+   invalidate_inode_pages2(inode-i_mapping);
+
+   nfs_fscache_zap_inode_cookie(inode);
+   }
+}
+
+/*
+ * Decide if we should enable or disable local caching for this inode.
+ * - For now, with NFS, only regular files that are open read-only will be able
+ *   to use the cache.
+ */
+void nfs_fscache_set_inode_cookie(struct inode *inode, struct file *filp)
+{
+   if (NFS_FSCACHE(inode)) {
+   if ((filp-f_flags  O_ACCMODE) != O_RDONLY)
+   nfs_fscache_disable_inode_cookie(inode);
+   else
+   nfs_fscache_enable_inode_cookie(inode);
+   }
+}
+
+/*
+ * Replace a per-inode cookie due to revalidation detecting a file having
+ * changed on the server.
+ */
+void nfs_fscache_renew_inode_cookie(struct inode *inode)
+{
+   struct nfs_inode *nfsi = NFS_I(inode);
+   struct nfs_server *nfss = NFS_SERVER(inode);
+   struct fscache_cookie *old = nfsi-fscache;
+
+   if (nfsi-fscache) {
+   /* retire the current fscache cache and get a new one */
+   fscache_relinquish_cookie(nfsi-fscache, 1);
+
+   nfsi-fscache = fscache_acquire_cookie(
+   nfss-nfs_client-fscache,
+   nfs_cache_inode_object_def,
+   nfsi);
+
+   dfprintk(FSCACHE,
+NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n,
+nfss, nfsi, old, nfsi-fscache);
+   }
+}
+
+/*
+ * Update the filesize associated with a per-inode cookie.
+ */
+void nfs_fscache_attr_changed(struct inode *inode)
+{
+   fscache_attr_changed(NFS_I(inode)-fscache);
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 7dcdf32..d730ec8 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -77,6 +77,15 @@ extern void nfs_fscache_get_super_cookie(struct super_block 
*,
 struct nfs_parsed_mount_data *);
 extern void nfs_fscache_release_super_cookie

[PATCH 04/37] KEYS: Add keyctl function to get a security label

2008-02-20 Thread David Howells

Add a keyctl() function to get the security label of a key.

The following is added to Documentation/keys.txt:

 (*) Get the LSM security context attached to a key.

long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
size_t buflen)

 This function returns a string that represents the LSM security context
 attached to a key in the buffer provided.

 Unless there's an error, it always returns the amount of data it could
 produce, even if that's too big for the buffer, but it won't copy more
 than requested to userspace. If the buffer pointer is NULL then no copy
 will take place.

 A NUL character is included at the end of the string if the buffer is
 sufficiently big.  This is included in the returned count.  If no LSM is
 in force then an empty string will be returned.

 A process must have view permission on the key for this function to be
 successful.

Signed-off-by: David Howells [EMAIL PROTECTED]
Acked-by:  Stephen Smalley [EMAIL PROTECTED]
---

 Documentation/keys.txt   |   21 +++
 include/linux/keyctl.h   |1 +
 include/linux/security.h |   20 +-
 security/dummy.c |8 ++
 security/keys/compat.c   |3 ++
 security/keys/keyctl.c   |   66 ++
 security/security.c  |5 +++
 security/selinux/hooks.c |   21 +--
 8 files changed, 141 insertions(+), 4 deletions(-)


diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index b82d38d..be424b0 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -711,6 +711,27 @@ The keyctl syscall functions are:
  The assumed authoritative key is inherited across fork and exec.
 
 
+ (*) Get the LSM security context attached to a key.
+
+   long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
+   size_t buflen)
+
+ This function returns a string that represents the LSM security context
+ attached to a key in the buffer provided.
+
+ Unless there's an error, it always returns the amount of data it could
+ produce, even if that's too big for the buffer, but it won't copy more
+ than requested to userspace. If the buffer pointer is NULL then no copy
+ will take place.
+
+ A NUL character is included at the end of the string if the buffer is
+ sufficiently big.  This is included in the returned count.  If no LSM is
+ in force then an empty string will be returned.
+
+ A process must have view permission on the key for this function to be
+ successful.
+
+
 ===
 KERNEL SERVICES
 ===
diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h
index 3365945..656ee6b 100644
--- a/include/linux/keyctl.h
+++ b/include/linux/keyctl.h
@@ -49,5 +49,6 @@
 #define KEYCTL_SET_REQKEY_KEYRING  14  /* set default request-key 
keyring */
 #define KEYCTL_SET_TIMEOUT 15  /* set key timeout */
 #define KEYCTL_ASSUME_AUTHORITY16  /* assume request_key() 
authorisation */
+#define KEYCTL_GET_SECURITY17  /* get key security label */
 
 #endif /*  _LINUX_KEYCTL_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index fe52cde..a33fd03 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -970,6 +970,17 @@ struct request_sock;
  * @perm describes the combination of permissions required of this key.
  * Return 1 if permission granted, 0 if permission denied and -ve it the
  *  normal permissions model should be effected.
+ * @key_getsecurity:
+ * Get a textual representation of the security context attached to a key
+ * for the purposes of honouring KEYCTL_GETSECURITY.  This function
+ * allocates the storage for the NUL-terminated string and the caller
+ * should free it.
+ * @key points to the key to be queried.
+ * @_buffer points to a pointer that should be set to point to the
+ *  resulting string (if no label or an error occurs).
+ * Return the length of the string (including terminating NUL) or -ve if
+ *  an error.
+ * May also return 0 (and a NULL buffer pointer) if there is no label.
  *
  * Security hooks affecting all System V IPC operations.
  *
@@ -1459,7 +1470,7 @@ struct security_operations {
int (*key_permission)(key_ref_t key_ref,
  struct task_struct *context,
  key_perm_t perm);
-
+   int (*key_getsecurity)(struct key *key, char **_buffer);
 #endif /* CONFIG_KEYS */
 
 };
@@ -2600,6 +2611,7 @@ int security_key_alloc(struct key *key, struct 
task_struct *tsk, unsigned long f
 void security_key_free(struct key *key);
 int security_key_permission(key_ref_t key_ref,
struct task_struct *context, key_perm_t perm);
+int security_key_getsecurity(struct key *key, char **_buffer);
 
 #else
 
@@ -2621,6 +2633,12 @@ static inline int

[PATCH 26/37] NFS: Define and create superblock-level objects

2008-02-20 Thread David Howells

Define and create superblock-level cache index objects (as managed by
nfs_server structs).

Each superblock object is created in a server level index object and is itself
an index into which inode-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the nosharecache option
exists this isn't possible.

The superblock object key is a sequence consisting of:

 (1) Certain superblock s_flags.

 (2) Various connection parameters that serve to distinguish superblocks for
 sget().

 (3) The volume FSID.

 (4) The security flavour.

 (5) The uniquifier length.

 (6) The uniquifier text.  This is normally an empty string, unless the fsc=xyz
 mount option was used to explicitly specify a uniquifier.

The key blob is of variable length, depending on the length of (6).

The superblock object is given no coherency data to carry in the auxiliary data
permitted by the cache.  It is assumed that the superblock is always coherent.


This patch also adds uniquification handling such that two otherwise identical
superblocks, at least one of which is marked nosharecache, won't end up
trying to share the on-disk cache.  It will be possible to manually provide a
uniquifier through a mount option with a later patch to avoid the error
otherwise produced.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache-index.c|   34 +
 fs/nfs/fscache.c  |  116 +
 fs/nfs/fscache.h  |   49 +++
 fs/nfs/internal.h |3 +
 fs/nfs/super.c|8 ++-
 include/linux/nfs_fs_sb.h |5 ++
 6 files changed, 213 insertions(+), 2 deletions(-)


diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 25ac4a1..b5a52e3 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -116,3 +116,37 @@ const struct fscache_cookie_def nfs_cache_server_index_def 
= {
.type   = FSCACHE_COOKIE_TYPE_INDEX,
.get_key= nfs_server_get_key,
 };
+
+/*
+ * Generate a key to describe a superblock key in the main NFS index
+ */
+static uint16_t nfs_super_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+   const struct nfs_fscache_key *key;
+   const struct nfs_server *nfss = cookie_netfs_data;
+   uint16_t len;
+
+   key = nfss-fscache_key;
+   len = sizeof(key-key) + key-key.uniq_len;
+   if (len  bufmax) {
+   len = 0;
+   } else {
+   memcpy(buffer, key-key, sizeof(key-key));
+   memcpy(buffer + sizeof(key-key),
+  key-key.uniquifier, key-key.uniq_len);
+   }
+
+   return len;
+}
+
+/*
+ * Define the superblock object for FS-Cache.  This is used to describe a
+ * superblock object to fscache_acquire_cookie().  It is keyed by all the NFS
+ * parameters that might cause a separate superblock.
+ */
+const struct fscache_cookie_def nfs_cache_super_index_def = {
+   .name   = NFS.super,
+   .type   = FSCACHE_COOKIE_TYPE_INDEX,
+   .get_key= nfs_super_get_key,
+};
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index dcc1800..cbd09f0 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -23,6 +23,9 @@
 
 #define NFSDBG_FACILITYNFSDBG_FSCACHE
 
+static struct rb_root nfs_fscache_keys = RB_ROOT;
+static DEFINE_SPINLOCK(nfs_fscache_keys_lock);
+
 /*
  * Get the per-client index cookie for an NFS client if the appropriate mount
  * flag was set
@@ -50,3 +53,116 @@ void nfs_fscache_release_client_cookie(struct nfs_client 
*clp)
fscache_relinquish_cookie(clp-fscache, 0);
clp-fscache = NULL;
 }
+
+/*
+ * Get the cache cookie for an NFS superblock.  We have to handle
+ * uniquification here because the cache doesn't do it for us.
+ */
+void nfs_fscache_get_super_cookie(struct super_block *sb,
+ struct nfs_parsed_mount_data *data)
+{
+   struct nfs_fscache_key *key, *xkey;
+   struct nfs_server *nfss = NFS_SB(sb);
+   struct rb_node **p, *parent;
+   const char *uniq = data-fscache_uniq ?: ;
+   int diff, ulen;
+
+   ulen = strlen(uniq);
+   key = kzalloc(sizeof(*key) + ulen, GFP_KERNEL);
+   if (!key)
+   return;
+
+   key-nfs_client = nfss-nfs_client;
+   key-key.super.s_flags = sb-s_flags  NFS_MS_MASK;
+   key-key.nfs_server.flags = nfss-flags;
+   key-key.nfs_server.rsize = nfss-rsize;
+   key-key.nfs_server.wsize = nfss-wsize;
+   key-key.nfs_server.acregmin = nfss-acregmin;
+   key-key.nfs_server.acregmax = nfss-acregmax;
+   key-key.nfs_server.acdirmin = nfss-acdirmin;
+   key-key.nfs_server.acdirmax = nfss-acdirmax;
+   key-key.nfs_server.fsid = nfss-fsid;
+   key-key.rpc_auth.au_flavor = nfss-client-cl_auth-au_flavor;
+
+   key-key.uniq_len = ulen

[PATCH 22/37] NFS: Add FS-Cache option bit and debug bit

2008-02-20 Thread David Howells

Add FS-Cache option bit to nfs_server struct.  This is set to indicate local
on-disk caching is enabled for a particular superblock.

Also add debug bit for local caching operations.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/nfs_fs.h|1 +
 include/linux/nfs_fs_sb.h |2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)


diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a69ba80..14894c9 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -578,6 +578,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_CALLBACK0x0100
 #define NFSDBG_CLIENT  0x0200
 #define NFSDBG_MOUNT   0x0400
+#define NFSDBG_FSCACHE 0x0800
 #define NFSDBG_ALL 0x
 
 #ifdef __KERNEL__
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 3423c67..e7c4cdd 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -99,6 +99,8 @@ struct nfs_server {
unsigned intacdirmin;
unsigned intacdirmax;
unsigned intnamelen;
+   unsigned intoptions;/* extra options enabled by 
mount */
+#define NFS_OPTION_FSCACHE 0x0001  /* - local caching enabled */
 
struct nfs_fsid fsid;
__u64   maxfilesize;/* maximum file size */

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/37] CacheFiles: Be consistent about the use of mapping vs file-f_mapping in Ext3

2008-02-20 Thread David Howells

Change all the usages of file-f_mapping in ext3_*write_end() functions to use
the mapping argument directly.  This has two consequences:

 (*) Consistency.  Without this patch sometimes one is used and sometimes the
 other is.

 (*) A NULL file pointer can be passed.  This feature is then made use of by
 the generic hook in the next patch, which is used by CacheFiles to write
 pages to a file without setting up a file struct.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/ext3/inode.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index eb95670..c976123 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1215,7 +1215,7 @@ static int ext3_generic_write_end(struct file *file,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
 {
-   struct inode *inode = file-f_mapping-host;
+   struct inode *inode = mapping-host;
 
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
 
@@ -1240,7 +1240,7 @@ static int ext3_ordered_write_end(struct file *file,
struct page *page, void *fsdata)
 {
handle_t *handle = ext3_journal_current_handle();
-   struct inode *inode = file-f_mapping-host;
+   struct inode *inode = mapping-host;
unsigned from, to;
int ret = 0, ret2;
 
@@ -1281,7 +1281,7 @@ static int ext3_writeback_write_end(struct file *file,
struct page *page, void *fsdata)
 {
handle_t *handle = ext3_journal_current_handle();
-   struct inode *inode = file-f_mapping-host;
+   struct inode *inode = mapping-host;
int ret = 0, ret2;
loff_t new_i_size;
 

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/37] FS-Cache: Recruit a couple of page flags for cache management

2008-02-20 Thread David Howells

Recruit a couple of page flags to aid in cache management.  The following extra
flags are defined:

 (1) PG_fscache (PG_private_2)

 The marked page is backed by a local cache and is pinning resources in the
 cache driver.

 (2) PG_fscache_write (PG_owner_priv_2)

 The marked page is being written to the local cache.  The page may not be
 modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/splice.c|2 +-
 include/linux/page-flags.h |   39 +--
 include/linux/pagemap.h|   11 +++
 mm/filemap.c   |   18 ++
 mm/migrate.c   |2 +-
 mm/page_alloc.c|3 +++
 mm/readahead.c |9 +
 mm/swap.c  |4 ++--
 mm/swap_state.c|4 ++--
 mm/truncate.c  |   10 +-
 mm/vmscan.c|2 +-
 11 files changed, 86 insertions(+), 18 deletions(-)


diff --git a/fs/splice.c b/fs/splice.c
index 9b559ee..f2a7a06 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info 
*pipe,
 */
wait_on_page_writeback(page);
 
-   if (PagePrivate(page))
+   if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bbad43f..cc16c23 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -77,25 +77,32 @@
 #define PG_active   6
 #define PG_slab 7  /* slab debug (Suparna wants 
this) */
 
-#define PG_owner_priv_1 8  /* Owner use. If pagecache, fs 
may use*/
+#define PG_owner_priv_1 8  /* Owner use. fs may use in 
pagecache */
 #define PG_arch_1   9
 #define PG_reserved10
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_private_2   13  /* If pagecache, has fs aux data */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
+#define PG_owner_priv_218  /* Owner use. fs may use in 
pagecache */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache_write   PG_owner_priv_2 /* Writing to local cache */
+
+/* PG_private_2 causes releasepage() and co to be invoked */
+#define PG_fscache PG_private_2/* Backed by local cache */
+
 
 #if (BITS_PER_LONG  32)
 /*
@@ -235,6 +242,23 @@ static inline void SetPageUptodate(struct page *page)
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,  \
(page)-flags)
 
+#define PagePrivate2(page) test_bit(PG_private_2, (page)-flags)
+#define SetPagePrivate2(page)  set_bit(PG_private_2, (page)-flags)
+#define ClearPagePrivate2(page)clear_bit(PG_private_2, (page)-flags)
+#define TestSetPagePrivate2(page) test_and_set_bit(PG_private_2, 
(page)-flags)
+#define TestClearPagePrivate2(page) test_and_clear_bit(PG_private_2, \
+ (page)-flags)
+
+#define PageOwnerPriv2(page)   test_bit(PG_owner_priv_2, \
+(page)-flags)
+#define SetPageOwnerPriv2(page)set_bit(PG_owner_priv_2, 
(page)-flags)
+#define ClearPageOwnerPriv2(page)  clear_bit(PG_owner_priv_2, \
+ (page)-flags)
+#define TestSetPageOwnerPriv2(page)test_and_set_bit(PG_owner_priv_2, \
+(page)-flags)
+#define TestClearPageOwnerPriv2(page)  test_and_clear_bit(PG_owner_priv_2, \
+  (page)-flags)
+
 #define PageBuddy(page

[PATCH 29/37] NFS: Invalidate FsCache page flags when cache removed

2008-02-20 Thread David Howells

Invalidate the FsCache page flags on the pages belonging to an inode when the
cache backing that NFS inode is removed.

This allows a live cache to be withdrawn.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache-index.c |   40 
 1 files changed, 40 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index c3c63fa..eec8e7e 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -246,6 +246,45 @@ static enum fscache_checkaux 
nfs_cache_inode_check_aux(void *cookie_netfs_data,
 }
 
 /*
+ * Indication from FS-Cache that the cookie is no longer cached
+ * - This function is called when the backing store currently caching a cookie
+ *   is removed
+ * - The netfs should use this to clean up any markers indicating cached pages
+ * - This is mandatory for any object that may have data
+ */
+static void nfs_cache_inode_now_uncached(void *cookie_netfs_data)
+{
+   struct nfs_inode *nfsi = cookie_netfs_data;
+   struct pagevec pvec;
+   pgoff_t first;
+   int loop, nr_pages;
+
+   pagevec_init(pvec, 0);
+   first = 0;
+
+   dprintk(NFS: nfs_inode_now_uncached: nfs_inode 0x%p\n, nfsi);
+
+   for (;;) {
+   /* grab a bunch of pages to unmark */
+   nr_pages = pagevec_lookup(pvec,
+ nfsi-vfs_inode.i_mapping,
+ first,
+ PAGEVEC_SIZE - pagevec_count(pvec));
+   if (!nr_pages)
+   break;
+
+   for (loop = 0; loop  nr_pages; loop++)
+   ClearPageFsCache(pvec.pages[loop]);
+
+   first = pvec.pages[nr_pages - 1]-index + 1;
+
+   pvec.nr = nr_pages;
+   pagevec_release(pvec);
+   cond_resched();
+   }
+}
+
+/*
  * Define the inode object for FS-Cache.  This is used to describe an inode
  * object to fscache_acquire_cookie().  It is keyed by the NFS file handle for
  * an inode.
@@ -261,4 +300,5 @@ const struct fscache_cookie_def nfs_cache_inode_object_def 
= {
.get_attr   = nfs_cache_inode_get_attr,
.get_aux= nfs_cache_inode_get_aux,
.check_aux  = nfs_cache_inode_check_aux,
+   .now_uncached   = nfs_cache_inode_now_uncached,
 };

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 33/37] NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching

2008-02-20 Thread David Howells

nfs_readpage_async() needs to be non-static so that it can be used as a
fallback for the local on-disk caching should an EIO crop up when reading the
cache.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/read.c  |4 ++--
 include/linux/nfs_fs.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 3d7d963..725a5a2 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -114,8 +114,8 @@ static void nfs_readpage_truncate_uninitialised_page(struct 
nfs_read_data *data)
}
 }
 
-static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode 
*inode,
-   struct page *page)
+int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
+  struct page *page)
 {
LIST_HEAD(one_request);
struct nfs_page *new;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index d9adb53..d1d545e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -505,6 +505,8 @@ extern int  nfs_readpages(struct file *, struct 
address_space *,
struct list_head *, unsigned);
 extern int  nfs_readpage_result(struct rpc_task *, struct nfs_read_data *);
 extern void nfs_readdata_release(void *data);
+extern int  nfs_readpage_async(struct nfs_open_context *, struct inode *,
+  struct page *);
 
 /*
  * Allocate nfs_read_data structures

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 36/37] NFS: Display local caching state

2008-02-20 Thread David Howells

Display the local caching state in /proc/fs/nfsfs/volumes.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/client.c  |7 ---
 fs/nfs/fscache.h |   15 +++
 2 files changed, 19 insertions(+), 3 deletions(-)


diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 51e9346..d67d52f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1451,7 +1451,7 @@ static int nfs_volume_list_show(struct seq_file *m, void 
*v)
 
/* display header on line 1 */
if (v == nfs_volume_list) {
-   seq_puts(m, NV SERVER   PORT DEV FSID\n);
+   seq_puts(m, NV SERVER   PORT DEV FSID  FSC\n);
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1465,12 +1465,13 @@ static int nfs_volume_list_show(struct seq_file *m, 
void *v)
 (unsigned long long) server-fsid.major,
 (unsigned long long) server-fsid.minor);
 
-   seq_printf(m, v%u %s %s %-7s %-17s\n,
+   seq_printf(m, v%u %s %s %-7s %-17s %s\n,
   clp-rpc_ops-version,
   rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
   rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_PORT),
   dev,
-  fsid);
+  fsid,
+  nfs_server_fscache_state(server));
 
return 0;
 }
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 6264cd8..5f7806f 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -146,6 +146,16 @@ static inline void nfs_readpage_to_fscache(struct inode 
*inode,
__nfs_readpage_to_fscache(inode, page, sync);
 }
 
+/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+   if (server-fscache  (server-options  NFS_OPTION_FSCACHE))
+   return yes;
+   return no ;
+}
+
 
 #else /* CONFIG_NFS_FSCACHE */
 static inline int nfs_fscache_register(void) { return 0; }
@@ -195,5 +205,10 @@ static inline int nfs_readpages_from_fscache(struct 
nfs_open_context *ctx,
 static inline void nfs_readpage_to_fscache(struct inode *inode,
   struct page *page, int sync) {}
 
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+   return no ;
+}
+
 #endif /* CONFIG_NFS_FSCACHE */
 #endif /* _NFS_FSCACHE_H */

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/37] CacheFiles: Add missing copy_page export for ia64

2008-02-20 Thread David Howells

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/ia64_ksyms.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index 8e7193d..3e544f4 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -46,6 +46,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 34/37] NFS: Read pages from FS-Cache into an NFS inode

2008-02-20 Thread David Howells

Read pages from an FS-Cache data storage object representing an inode into an
NFS inode.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache.c |  112 ++
 fs/nfs/fscache.h |   47 +++
 fs/nfs/read.c|   18 +
 3 files changed, 176 insertions(+), 1 deletions(-)


diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index d475ff5..438cc9b 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -344,5 +344,115 @@ void __nfs_fscache_invalidate_page(struct page *page, 
struct inode *inode)
 
BUG_ON(!PageLocked(page));
fscache_uncache_page(nfsi-fscache, page);
-   nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1);
+}
+
+/*
+ * Handle completion of a page being read from the cache.
+ * - Called in process (keventd) context.
+ */
+static void nfs_readpage_from_fscache_complete(struct page *page,
+  void *context,
+  int error)
+{
+   dfprintk(FSCACHE,
+NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n,
+page, context, error);
+
+   /* if the read completes with an error, we just unlock the page and let
+* the VM reissue the readpage */
+   if (!error) {
+   SetPageUptodate(page);
+   unlock_page(page);
+   } else {
+   error = nfs_readpage_async(context, page-mapping-host, page);
+   if (error)
+   unlock_page(page);
+   }
+}
+
+/*
+ * Retrieve a page from fscache
+ */
+int __nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+   struct inode *inode, struct page *page)
+{
+   int ret;
+
+   dfprintk(FSCACHE,
+NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n,
+NFS_I(inode)-fscache, page, page-index, page-flags, inode);
+
+   ret = fscache_read_or_alloc_page(NFS_I(inode)-fscache,
+page,
+nfs_readpage_from_fscache_complete,
+ctx,
+GFP_KERNEL);
+
+   switch (ret) {
+   case 0: /* read BIO submitted (page in fscache) */
+   dfprintk(FSCACHE,
+NFS:readpage_from_fscache: BIO submitted\n);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, 1);
+   return ret;
+
+   case -ENOBUFS: /* inode not in cache */
+   case -ENODATA: /* page not in cache */
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1);
+   dfprintk(FSCACHE,
+NFS:readpage_from_fscache %d\n, ret);
+   return 1;
+
+   default:
+   dfprintk(FSCACHE, NFS:readpage_from_fscache %d\n, ret);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1);
+   }
+   return ret;
+}
+
+/*
+ * Retrieve a set of pages from fscache
+ */
+int __nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+struct inode *inode,
+struct address_space *mapping,
+struct list_head *pages,
+unsigned *nr_pages)
+{
+   int ret, npages = *nr_pages;
+
+   dfprintk(FSCACHE, NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n,
+NFS_I(inode)-fscache, npages, inode);
+
+   ret = fscache_read_or_alloc_pages(NFS_I(inode)-fscache,
+ mapping, pages, nr_pages,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ mapping_gfp_mask(mapping));
+   if (*nr_pages  npages)
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, npages);
+   if (*nr_pages  0)
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, *nr_pages);
+
+   switch (ret) {
+   case 0: /* read submitted to the cache for all pages */
+   BUG_ON(!list_empty(pages));
+   BUG_ON(*nr_pages != 0);
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: submitted\n);
+
+   return ret;
+
+   case -ENOBUFS: /* some pages aren't cached and can't be */
+   case -ENODATA: /* some pages aren't cached */
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: no page: %d\n, ret);
+   return 1;
+
+   default:
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: ret  %d\n, ret);
+   }
+
+   return ret;
 }
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 1cb7d96..4c1e1a8 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -89,6 +89,12

[PATCH 18/37] CacheFiles: Permit the page lock state to be monitored

2008-02-20 Thread David Howells

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/pagemap.h |5 +
 mm/filemap.c|   18 ++
 2 files changed, 23 insertions(+), 0 deletions(-)


diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c8bd762..76b5307 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -242,6 +242,11 @@ static inline void wait_on_page_owner_priv_2(struct page 
*page)
 extern void end_page_owner_priv_2(struct page *page);
 
 /*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index a583f44..561e6c7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -548,6 +548,24 @@ void wait_on_page_bit(struct page *page, int bit_nr)
 EXPORT_SYMBOL(wait_on_page_bit);
 
 /**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+   wait_queue_head_t *q = page_waitqueue(page);
+   unsigned long flags;
+
+   spin_lock_irqsave(q-lock, flags);
+   __add_wait_queue(q, waiter);
+   spin_unlock_irqrestore(q-lock, flags);
+}
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
  * unlock_page - unlock a locked page
  * @page: the page
  *

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 31/37] NFS: FS-Cache page management

2008-02-20 Thread David Howells

FS-Cache page management for NFS.  This includes hooking the releasing and
invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for
completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2).

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/file.c|   17 +
 fs/nfs/fscache.c |   49 +
 fs/nfs/fscache.h |   22 ++
 3 files changed, 84 insertions(+), 4 deletions(-)


diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 26a073b..60db3ea 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -35,6 +35,7 @@
 #include delegation.h
 #include internal.h
 #include iostat.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_FILE
 
@@ -358,7 +359,7 @@ static int nfs_write_end(struct file *file, struct 
address_space *mapping,
  * Partially or wholly invalidate a page
  * - Release the private state associated with a page if undergoing complete
  *   page invalidation
- * - Called if either PG_private or PG_private_2 is set on the page
+ * - Called if either PG_private or PG_fscache is set on the page
  * - Caller holds page lock
  */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
@@ -367,30 +368,35 @@ static void nfs_invalidate_page(struct page *page, 
unsigned long offset)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page-mapping-host, page);
+
+   nfs_fscache_invalidate_page(page, page-mapping-host);
 }
 
 /*
  * Attempt to release the private state associated with a page
- * - Called if either PG_private or PG_private_2 is set on the page
+ * - Called if either PG_private or PG_fscache is set on the page
  * - Caller holds page lock
  * - Return true (may release page) or false (may not)
  */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
/* If PagePrivate() is set, then the page is not freeable */
-   return 0;
+   if (PagePrivate(page))
+   return 0;
+   return nfs_fscache_release_page(page, gfp);
 }
 
 /*
  * Attempt to clear the private state associated with a page when an error
  * occurs that requires the cached contents of an inode to be written back or
  * destroyed
- * - Called if either PG_private or PG_private_2 is set on the page
+ * - Called if either PG_private or fscache is set on the page
  * - Caller holds page lock
  * - Return 0 if successful, -error otherwise
  */
 static int nfs_launder_page(struct page *page)
 {
+   wait_on_page_fscache_write(page);
return nfs_wb_page(page-mapping-host, page);
 }
 
@@ -422,6 +428,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, 
struct page *page)
int ret = -EINVAL;
struct address_space *mapping;
 
+   /* make sure the cache has finished storing the page */
+   wait_on_page_fscache_write(page);
+
lock_page(page);
mapping = page-mapping;
if (mapping != vma-vm_file-f_path.dentry-d_inode-i_mapping)
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index c0e0320..d475ff5 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -19,6 +19,7 @@
 #include linux/seq_file.h
 
 #include internal.h
+#include iostat.h
 #include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_FSCACHE
@@ -297,3 +298,51 @@ void nfs_fscache_attr_changed(struct inode *inode)
 {
fscache_attr_changed(NFS_I(inode)-fscache);
 }
+
+/*
+ * Release the caching state associated with a page, if the page isn't busy
+ * interacting with the cache.
+ * - Returns true (can release page) or false (page busy).
+ */
+int nfs_fscache_release_page(struct page *page, gfp_t gfp)
+{
+   if (PageFsCacheWrite(page)) {
+   if (!(gfp  __GFP_WAIT))
+   return 0;
+   wait_on_page_fscache_write(page);
+   }
+
+   if (PageFsCache(page)) {
+   struct nfs_inode *nfsi = NFS_I(page-mapping-host);
+
+   BUG_ON(!nfsi-fscache);
+
+   dfprintk(FSCACHE, NFS: fscache releasepage (0x%p/0x%p/0x%p)\n,
+nfsi-fscache, page, nfsi);
+
+   fscache_uncache_page(nfsi-fscache, page);
+   nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1);
+   }
+
+   return 1;
+}
+
+/*
+ * Release the caching state associated with a page if undergoing complete page
+ * invalidation.
+ */
+void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode)
+{
+   struct nfs_inode *nfsi = NFS_I(inode);
+
+   BUG_ON(!nfsi-fscache);
+
+   dfprintk(FSCACHE, NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n,
+nfsi-fscache, page, nfsi);
+
+   wait_on_page_fscache_write(page);
+
+   BUG_ON(!PageLocked(page));
+   fscache_uncache_page(nfsi-fscache, page);
+   nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1);
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index d730ec8..1cb7d96

[PATCH 35/37] NFS: Store pages from an NFS inode into a local cache

2008-02-20 Thread David Howells

Store pages from an NFS inode into the cache data storage object associated
with that inode.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache.c |   26 ++
 fs/nfs/fscache.h |   16 
 fs/nfs/read.c|5 +
 3 files changed, 47 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 438cc9b..50ae70f 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -456,3 +456,29 @@ int __nfs_readpages_from_fscache(struct nfs_open_context 
*ctx,
 
return ret;
 }
+
+/*
+ * Store a newly fetched page in fscache
+ * - PG_fscache must be set on the page
+ */
+void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int 
sync)
+{
+   int ret;
+
+   dfprintk(FSCACHE,
+NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n,
+NFS_I(inode)-fscache, page, page-index, page-flags, sync);
+
+   ret = fscache_write_page(NFS_I(inode)-fscache, page, GFP_KERNEL);
+   dfprintk(FSCACHE,
+NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n,
+page, page-index, page-flags, ret);
+
+   if (ret != 0) {
+   fscache_uncache_page(NFS_I(inode)-fscache, page);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_FAIL, 1);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1);
+   } else {
+   nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_OK, 1);
+   }
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 4c1e1a8..6264cd8 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -94,6 +94,7 @@ extern int __nfs_readpage_from_fscache(struct 
nfs_open_context *,
 extern int __nfs_readpages_from_fscache(struct nfs_open_context *,
struct inode *, struct address_space *,
struct list_head *, unsigned *);
+extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int);
 
 /*
  * release the caching state associated with a page if undergoing complete page
@@ -133,6 +134,19 @@ static inline int nfs_readpages_from_fscache(struct 
nfs_open_context *ctx,
return -ENOBUFS;
 }
 
+/*
+ * Store a page newly fetched from the server in an inode data storage object
+ * in the cache.
+ */
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+  struct page *page,
+  int sync)
+{
+   if (PageFsCache(page))
+   __nfs_readpage_to_fscache(inode, page, sync);
+}
+
+
 #else /* CONFIG_NFS_FSCACHE */
 static inline int nfs_fscache_register(void) { return 0; }
 static inline void nfs_fscache_unregister(void) {}
@@ -178,6 +192,8 @@ static inline int nfs_readpages_from_fscache(struct 
nfs_open_context *ctx,
 {
return -ENOBUFS;
 }
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+  struct page *page, int sync) {}
 
 #endif /* CONFIG_NFS_FSCACHE */
 #endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index db27b26..e09bdf9 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -143,6 +143,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, 
struct inode *inode,
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
+   struct inode *d_inode = req-wb_context-path.dentry-d_inode;
+
+   if (PageUptodate(req-wb_page))
+   nfs_readpage_to_fscache(d_inode, req-wb_page, 0);
+
unlock_page(req-wb_page);
 
dprintk(NFS: read done (%s/%Ld [EMAIL PROTECTED])\n,

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/37] CacheFiles: Add a hook to write a single page of data to an inode

2008-02-20 Thread David Howells

Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).  The data source is a single page.

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Supply a generic implementation for this that uses the write_begin() and
write_end() address_space operations to bind a copy directly into the page
cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/ext2/inode.c|2 ++
 fs/ext3/inode.c|3 +++
 include/linux/fs.h |7 ++
 mm/filemap.c   |   61 
 4 files changed, 73 insertions(+), 0 deletions(-)


diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index c620068..f483014 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -792,6 +792,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 const struct address_space_operations ext2_aops_xip = {
@@ -810,6 +811,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 /*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index c976123..0209f3b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1776,6 +1776,7 @@ static const struct address_space_operations 
ext3_ordered_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_writeback_aops = {
@@ -1790,6 +1791,7 @@ static const struct address_space_operations 
ext3_writeback_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_journalled_aops = {
@@ -1803,6 +1805,7 @@ static const struct address_space_operations 
ext3_journalled_aops = {
.bmap   = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage= ext3_releasepage,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d218ef5..dd6c3d1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -481,6 +481,11 @@ struct address_space_operations {
int (*migratepage) (struct address_space *,
struct page *, struct page *);
int (*launder_page) (struct page *);
+   /* write the contents of the source page over the page at the specified
+* index in the target address space (the source page does not need to
+* be related to the target address space) */
+   int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
+
 };
 
 /*
@@ -1811,6 +1816,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, 
const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec 
*,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_page(struct address_space *,
+   pgoff_t, struct page *);
 extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, 
loff_t *ppos);
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t 
len, loff_t *ppos);
 extern int generic_segment_checks(const struct iovec *iov,
diff --git a/mm/filemap.c b/mm/filemap.c
index df1e149..a583f44 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2359,6 +2359,67 @@ generic_file_buffered_write(struct kiocb *iocb, const 
struct iovec *iov,
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
+/**
+ * generic_file_buffered_write_one_page - Write a single page of data to an
+ * inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified.  Note that the file will not be extended if
+ * the page crosses the EOF marker, in which case only the first part of the
+ * page will be written.
+ *
+ * The @source page does not need to have any association

[PATCH 37/37] NFS: Add mount options to enable local caching on NFS

2008-02-20 Thread David Howells

Add NFS mount options to allow the local caching support to be enabled.

The attached patch makes it possible for the NFS filesystem to be told to make
use of the network filesystem local caching service (FS-Cache).

To be able to use this, a recent nfsutils package is required.

There are three variant NFS mount options that can be added to a mount command
to control caching for a mount.  Only the last one specified takes effect:

 (*) Adding fsc will request caching.

 (*) Adding fsc=string will request caching and also specify a uniquifier.

 (*) Adding nofsc will disable caching.

For example:

mount warthog:/ /a -o fsc


The cache of a particular superblock (NFS FSID) will be shared between all
mounts of that volume, provided they have the same connection parameters and
are not marked 'nosharecache'.

Where it is otherwise impossible to distinguish superblocks because all the
parameters are identical, but the 'nosharecache' option is supplied, a
uniquifying string must be supplied, else only the first mount will be
permitted to use the cache.

If there's a key collision, then the second mount will disable caching and give
a warning into the kernel log.


Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/client.c   |2 ++
 fs/nfs/internal.h |1 +
 fs/nfs/super.c|   25 +
 3 files changed, 28 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index d67d52f..8357f68 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -669,6 +669,7 @@ static int nfs_init_server(struct nfs_server *server,
 
/* Initialise the client representation from the mount data */
server-flags = data-flags  NFS_MOUNT_FLAGMASK;
+   server-options = data-options;
 
if (data-rsize)
server-rsize = nfs_block_size(data-rsize, NULL);
@@ -1056,6 +1057,7 @@ static int nfs4_init_server(struct nfs_server *server,
/* Initialise the client representation from the mount data */
server-flags = data-flags  NFS_MOUNT_FLAGMASK;
server-caps |= NFS_CAP_ATOMIC_OPEN;
+   server-options = data-options;
 
if (data-rsize)
server-rsize = nfs_block_size(data-rsize, NULL);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e49cb6e..f427b35 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -38,6 +38,7 @@ struct nfs_parsed_mount_data {
int acregmin, acregmax,
acdirmin, acdirmax;
int namlen;
+   unsigned intoptions;
unsigned intbsize;
unsigned intauth_flavor_len;
rpc_authflavor_tauth_flavors[1];
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 79c4abe..4c513c6 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,7 @@ enum {
Opt_acl, Opt_noacl,
Opt_rdirplus, Opt_nordirplus,
Opt_sharecache, Opt_nosharecache,
+   Opt_fscache, Opt_nofscache,
 
/* Mount options that take integer arguments */
Opt_port,
@@ -92,6 +93,7 @@ enum {
/* Mount options that take string arguments */
Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
Opt_addr, Opt_mountaddr, Opt_clientaddr,
+   Opt_fscache_uniq,
 
/* Mount options that are ignored */
Opt_userspace, Opt_deprecated,
@@ -125,6 +127,9 @@ static match_table_t nfs_mount_option_tokens = {
{ Opt_nordirplus, nordirplus },
{ Opt_sharecache, sharecache },
{ Opt_nosharecache, nosharecache },
+   { Opt_fscache, fsc },
+   { Opt_fscache_uniq, fsc=%s },
+   { Opt_nofscache, nofsc },
 
{ Opt_port, port=%u },
{ Opt_rsize, rsize=%u },
@@ -486,6 +491,8 @@ static void nfs_show_mount_options(struct seq_file *m, 
struct nfs_server *nfss,
seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval 
/ HZ);
seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries);
seq_printf(m, ,sec=%s, 
nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor));
+   if (nfss-options  NFS_OPTION_FSCACHE)
+   seq_printf(m, ,fsc);
 }
 
 /*
@@ -780,6 +787,24 @@ static int nfs_parse_mount_options(char *raw,
case Opt_nosharecache:
mnt-flags |= NFS_MOUNT_UNSHARED;
break;
+   case Opt_fscache:
+   mnt-options |= NFS_OPTION_FSCACHE;
+   kfree(mnt-fscache_uniq);
+   mnt-fscache_uniq = NULL;
+   break;
+   case Opt_nofscache:
+   mnt-options = ~NFS_OPTION_FSCACHE;
+   kfree(mnt-fscache_uniq);
+   mnt-fscache_uniq = NULL;
+   break;
+   case Opt_fscache_uniq:
+   string = match_strdup(args);
+   if (!string)
+   goto

[PATCH 30/37] NFS: Add some new I/O event counters for FS-Cache events

2008-02-20 Thread David Howells

Add some new NFS I/O event counters for FS-Cache events.  They have to be
added as byte counters because I may need to be able to increase the numbers
by more than 1 at a time.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/iostat.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h
index 6350ecb..0e3b170 100644
--- a/fs/nfs/iostat.h
+++ b/fs/nfs/iostat.h
@@ -60,6 +60,13 @@ enum nfs_stat_bytecounters {
NFSIOS_SERVERWRITTENBYTES,
NFSIOS_READPAGES,
NFSIOS_WRITEPAGES,
+#ifdef CONFIG_NFS_FSCACHE
+   NFSIOS_FSCACHE_READ_OK,
+   NFSIOS_FSCACHE_READ_FAIL,
+   NFSIOS_FSCACHE_WRITE_OK,
+   NFSIOS_FSCACHE_WRITE_FAIL,
+   NFSIOS_FSCACHE_UNCACHE,
+#endif
__NFSIOS_BYTESMAX,
 };
 

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-20 Thread David Howells

Serge E. Hallyn [EMAIL PROTECTED] wrote:

 Seems *really* weird that every time you send this, patch 6 doesn't seem
 to reach me in any of my mailboxes...  (did get it from the url
 you listed)

It's the largest of the patches, so that's not entirely surprising.  Hence why
I included the URL to the tarball also.

 I'm sorry if I miss where you explicitly state this, but is it safe to
 assume, as perusing the patches suggests, that
 
   1. tsk-sec never changes other than in task_alloc_security()?  

Correct.

   2. tsk-act_as is only ever dereferenced from (a) current-

That ought to be correct.

  except (b) in do_coredump?

Actually, do_coredump() only deals with current-act_as.

 (thereby carefully avoiding locking issues)

That's the idea.

 I'd still like to see some performance numbers.  Not to object to
 these patches, just to make sure there's no need to try and optimize
 more of the dereferences away when they're not needed.

I hope that the performance impact is minimal.  The kernel should spend very
little time looking at the security data.  I'll try and get some though.

 Oh, manually copied from patch 6, I see you have in the task_security
 struct definition:
 
   kernel_cap_tcap_bset;   /* ? */
 
 That comment can be filled in with 'capability bounding set' (for this
 task and all its future descendents).

Thanks.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 34/37] NFS: Read pages from FS-Cache into an NFS inode

2008-02-08 Thread David Howells

Read pages from an FS-Cache data storage object representing an inode into an
NFS inode.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache.c |  112 ++
 fs/nfs/fscache.h |   47 +++
 fs/nfs/read.c|   18 +
 3 files changed, 176 insertions(+), 1 deletions(-)


diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index d475ff5..438cc9b 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -344,5 +344,115 @@ void __nfs_fscache_invalidate_page(struct page *page, 
struct inode *inode)
 
BUG_ON(!PageLocked(page));
fscache_uncache_page(nfsi-fscache, page);
-   nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1);
+}
+
+/*
+ * Handle completion of a page being read from the cache.
+ * - Called in process (keventd) context.
+ */
+static void nfs_readpage_from_fscache_complete(struct page *page,
+  void *context,
+  int error)
+{
+   dfprintk(FSCACHE,
+NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n,
+page, context, error);
+
+   /* if the read completes with an error, we just unlock the page and let
+* the VM reissue the readpage */
+   if (!error) {
+   SetPageUptodate(page);
+   unlock_page(page);
+   } else {
+   error = nfs_readpage_async(context, page-mapping-host, page);
+   if (error)
+   unlock_page(page);
+   }
+}
+
+/*
+ * Retrieve a page from fscache
+ */
+int __nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+   struct inode *inode, struct page *page)
+{
+   int ret;
+
+   dfprintk(FSCACHE,
+NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n,
+NFS_I(inode)-fscache, page, page-index, page-flags, inode);
+
+   ret = fscache_read_or_alloc_page(NFS_I(inode)-fscache,
+page,
+nfs_readpage_from_fscache_complete,
+ctx,
+GFP_KERNEL);
+
+   switch (ret) {
+   case 0: /* read BIO submitted (page in fscache) */
+   dfprintk(FSCACHE,
+NFS:readpage_from_fscache: BIO submitted\n);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, 1);
+   return ret;
+
+   case -ENOBUFS: /* inode not in cache */
+   case -ENODATA: /* page not in cache */
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1);
+   dfprintk(FSCACHE,
+NFS:readpage_from_fscache %d\n, ret);
+   return 1;
+
+   default:
+   dfprintk(FSCACHE, NFS:readpage_from_fscache %d\n, ret);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1);
+   }
+   return ret;
+}
+
+/*
+ * Retrieve a set of pages from fscache
+ */
+int __nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+struct inode *inode,
+struct address_space *mapping,
+struct list_head *pages,
+unsigned *nr_pages)
+{
+   int ret, npages = *nr_pages;
+
+   dfprintk(FSCACHE, NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n,
+NFS_I(inode)-fscache, npages, inode);
+
+   ret = fscache_read_or_alloc_pages(NFS_I(inode)-fscache,
+ mapping, pages, nr_pages,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ mapping_gfp_mask(mapping));
+   if (*nr_pages  npages)
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, npages);
+   if (*nr_pages  0)
+   nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, *nr_pages);
+
+   switch (ret) {
+   case 0: /* read submitted to the cache for all pages */
+   BUG_ON(!list_empty(pages));
+   BUG_ON(*nr_pages != 0);
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: submitted\n);
+
+   return ret;
+
+   case -ENOBUFS: /* some pages aren't cached and can't be */
+   case -ENODATA: /* some pages aren't cached */
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: no page: %d\n, ret);
+   return 1;
+
+   default:
+   dfprintk(FSCACHE,
+NFS: nfs_getpages_from_fscache: ret  %d\n, ret);
+   }
+
+   return ret;
 }
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 1cb7d96..4c1e1a8 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -89,6 +89,12

[PATCH 35/37] NFS: Store pages from an NFS inode into a local cache

2008-02-08 Thread David Howells

Store pages from an NFS inode into the cache data storage object associated
with that inode.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache.c |   26 ++
 fs/nfs/fscache.h |   16 
 fs/nfs/read.c|5 +
 3 files changed, 47 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 438cc9b..50ae70f 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -456,3 +456,29 @@ int __nfs_readpages_from_fscache(struct nfs_open_context 
*ctx,
 
return ret;
 }
+
+/*
+ * Store a newly fetched page in fscache
+ * - PG_fscache must be set on the page
+ */
+void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int 
sync)
+{
+   int ret;
+
+   dfprintk(FSCACHE,
+NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n,
+NFS_I(inode)-fscache, page, page-index, page-flags, sync);
+
+   ret = fscache_write_page(NFS_I(inode)-fscache, page, GFP_KERNEL);
+   dfprintk(FSCACHE,
+NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n,
+page, page-index, page-flags, ret);
+
+   if (ret != 0) {
+   fscache_uncache_page(NFS_I(inode)-fscache, page);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_FAIL, 1);
+   nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1);
+   } else {
+   nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_OK, 1);
+   }
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 4c1e1a8..6264cd8 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -94,6 +94,7 @@ extern int __nfs_readpage_from_fscache(struct 
nfs_open_context *,
 extern int __nfs_readpages_from_fscache(struct nfs_open_context *,
struct inode *, struct address_space *,
struct list_head *, unsigned *);
+extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int);
 
 /*
  * release the caching state associated with a page if undergoing complete page
@@ -133,6 +134,19 @@ static inline int nfs_readpages_from_fscache(struct 
nfs_open_context *ctx,
return -ENOBUFS;
 }
 
+/*
+ * Store a page newly fetched from the server in an inode data storage object
+ * in the cache.
+ */
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+  struct page *page,
+  int sync)
+{
+   if (PageFsCache(page))
+   __nfs_readpage_to_fscache(inode, page, sync);
+}
+
+
 #else /* CONFIG_NFS_FSCACHE */
 static inline int nfs_fscache_register(void) { return 0; }
 static inline void nfs_fscache_unregister(void) {}
@@ -178,6 +192,8 @@ static inline int nfs_readpages_from_fscache(struct 
nfs_open_context *ctx,
 {
return -ENOBUFS;
 }
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+  struct page *page, int sync) {}
 
 #endif /* CONFIG_NFS_FSCACHE */
 #endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index db27b26..e09bdf9 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -143,6 +143,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, 
struct inode *inode,
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
+   struct inode *d_inode = req-wb_context-path.dentry-d_inode;
+
+   if (PageUptodate(req-wb_page))
+   nfs_readpage_to_fscache(d_inode, req-wb_page, 0);
+
unlock_page(req-wb_page);
 
dprintk(NFS: read done (%s/%Ld [EMAIL PROTECTED])\n,

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/37] NFS: Permit local filesystem caching to be enabled for NFS

2008-02-08 Thread David Howells

Permit local filesystem caching to be enabled for NFS in the kernel
configuration.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig |8 
 1 files changed, 8 insertions(+), 0 deletions(-)


diff --git a/fs/Kconfig b/fs/Kconfig
index c42ec50..fa8e978 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1644,6 +1644,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool Provide NFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m  FSCACHE || NFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool Allow direct I/O on NFS files
depends on NFS_FS

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 29/37] NFS: Invalidate FsCache page flags when cache removed

2008-02-08 Thread David Howells

Invalidate the FsCache page flags on the pages belonging to an inode when the
cache backing that NFS inode is removed.

This allows a live cache to be withdrawn.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache-index.c |   40 
 1 files changed, 40 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index c3c63fa..eec8e7e 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -246,6 +246,45 @@ static enum fscache_checkaux 
nfs_cache_inode_check_aux(void *cookie_netfs_data,
 }
 
 /*
+ * Indication from FS-Cache that the cookie is no longer cached
+ * - This function is called when the backing store currently caching a cookie
+ *   is removed
+ * - The netfs should use this to clean up any markers indicating cached pages
+ * - This is mandatory for any object that may have data
+ */
+static void nfs_cache_inode_now_uncached(void *cookie_netfs_data)
+{
+   struct nfs_inode *nfsi = cookie_netfs_data;
+   struct pagevec pvec;
+   pgoff_t first;
+   int loop, nr_pages;
+
+   pagevec_init(pvec, 0);
+   first = 0;
+
+   dprintk(NFS: nfs_inode_now_uncached: nfs_inode 0x%p\n, nfsi);
+
+   for (;;) {
+   /* grab a bunch of pages to unmark */
+   nr_pages = pagevec_lookup(pvec,
+ nfsi-vfs_inode.i_mapping,
+ first,
+ PAGEVEC_SIZE - pagevec_count(pvec));
+   if (!nr_pages)
+   break;
+
+   for (loop = 0; loop  nr_pages; loop++)
+   ClearPageFsCache(pvec.pages[loop]);
+
+   first = pvec.pages[nr_pages - 1]-index + 1;
+
+   pvec.nr = nr_pages;
+   pagevec_release(pvec);
+   cond_resched();
+   }
+}
+
+/*
  * Define the inode object for FS-Cache.  This is used to describe an inode
  * object to fscache_acquire_cookie().  It is keyed by the NFS file handle for
  * an inode.
@@ -261,4 +300,5 @@ const struct fscache_cookie_def nfs_cache_inode_object_def 
= {
.get_attr   = nfs_cache_inode_get_attr,
.get_aux= nfs_cache_inode_get_aux,
.check_aux  = nfs_cache_inode_check_aux,
+   .now_uncached   = nfs_cache_inode_now_uncached,
 };

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/37] NFS: Define and create superblock-level objects

2008-02-08 Thread David Howells

Define and create superblock-level cache index objects (as managed by
nfs_server structs).

Each superblock object is created in a server level index object and is itself
an index into which inode-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the nosharecache option
exists this isn't possible.

The superblock object key is a sequence consisting of:

 (1) Certain superblock s_flags.

 (2) Various connection parameters that serve to distinguish superblocks for
 sget().

 (3) The volume FSID.

 (4) The security flavour.

 (5) The uniquifier length.

 (6) The uniquifier text.  This is normally an empty string, unless the fsc=xyz
 mount option was used to explicitly specify a uniquifier.

The key blob is of variable length, depending on the length of (6).

The superblock object is given no coherency data to carry in the auxiliary data
permitted by the cache.  It is assumed that the superblock is always coherent.


This patch also adds uniquification handling such that two otherwise identical
superblocks, at least one of which is marked nosharecache, won't end up
trying to share the on-disk cache.  It will be possible to manually provide a
uniquifier through a mount option with a later patch to avoid the error
otherwise produced.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/fscache-index.c|   34 +
 fs/nfs/fscache.c  |  116 +
 fs/nfs/fscache.h  |   49 +++
 fs/nfs/internal.h |3 +
 fs/nfs/super.c|8 ++-
 include/linux/nfs_fs_sb.h |5 ++
 6 files changed, 213 insertions(+), 2 deletions(-)


diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 25ac4a1..b5a52e3 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -116,3 +116,37 @@ const struct fscache_cookie_def nfs_cache_server_index_def 
= {
.type   = FSCACHE_COOKIE_TYPE_INDEX,
.get_key= nfs_server_get_key,
 };
+
+/*
+ * Generate a key to describe a superblock key in the main NFS index
+ */
+static uint16_t nfs_super_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+   const struct nfs_fscache_key *key;
+   const struct nfs_server *nfss = cookie_netfs_data;
+   uint16_t len;
+
+   key = nfss-fscache_key;
+   len = sizeof(key-key) + key-key.uniq_len;
+   if (len  bufmax) {
+   len = 0;
+   } else {
+   memcpy(buffer, key-key, sizeof(key-key));
+   memcpy(buffer + sizeof(key-key),
+  key-key.uniquifier, key-key.uniq_len);
+   }
+
+   return len;
+}
+
+/*
+ * Define the superblock object for FS-Cache.  This is used to describe a
+ * superblock object to fscache_acquire_cookie().  It is keyed by all the NFS
+ * parameters that might cause a separate superblock.
+ */
+const struct fscache_cookie_def nfs_cache_super_index_def = {
+   .name   = NFS.super,
+   .type   = FSCACHE_COOKIE_TYPE_INDEX,
+   .get_key= nfs_super_get_key,
+};
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index dcc1800..cbd09f0 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -23,6 +23,9 @@
 
 #define NFSDBG_FACILITYNFSDBG_FSCACHE
 
+static struct rb_root nfs_fscache_keys = RB_ROOT;
+static DEFINE_SPINLOCK(nfs_fscache_keys_lock);
+
 /*
  * Get the per-client index cookie for an NFS client if the appropriate mount
  * flag was set
@@ -50,3 +53,116 @@ void nfs_fscache_release_client_cookie(struct nfs_client 
*clp)
fscache_relinquish_cookie(clp-fscache, 0);
clp-fscache = NULL;
 }
+
+/*
+ * Get the cache cookie for an NFS superblock.  We have to handle
+ * uniquification here because the cache doesn't do it for us.
+ */
+void nfs_fscache_get_super_cookie(struct super_block *sb,
+ struct nfs_parsed_mount_data *data)
+{
+   struct nfs_fscache_key *key, *xkey;
+   struct nfs_server *nfss = NFS_SB(sb);
+   struct rb_node **p, *parent;
+   const char *uniq = data-fscache_uniq ?: ;
+   int diff, ulen;
+
+   ulen = strlen(uniq);
+   key = kzalloc(sizeof(*key) + ulen, GFP_KERNEL);
+   if (!key)
+   return;
+
+   key-nfs_client = nfss-nfs_client;
+   key-key.super.s_flags = sb-s_flags  NFS_MS_MASK;
+   key-key.nfs_server.flags = nfss-flags;
+   key-key.nfs_server.rsize = nfss-rsize;
+   key-key.nfs_server.wsize = nfss-wsize;
+   key-key.nfs_server.acregmin = nfss-acregmin;
+   key-key.nfs_server.acregmax = nfss-acregmax;
+   key-key.nfs_server.acdirmin = nfss-acdirmin;
+   key-key.nfs_server.acdirmax = nfss-acdirmax;
+   key-key.nfs_server.fsid = nfss-fsid;
+   key-key.rpc_auth.au_flavor = nfss-client-cl_auth-au_flavor;
+
+   key-key.uniq_len = ulen

1 2 3 >

1 - 100 of 263 matches

Mail list logo