Hi,

I am Rohan Jain, a student from Indian Institute of Technology,
Kharagpur. I'll be doing a Google Summer of Code project with django
this year under the title "Security Enhancements". As the title
suggests, it has something to do with Security Enhancements: like
improvements in CSRF protection and tokenization.

I have made some small updates to the proposal with the feedback it
got. It is under VC over here: http://gist.github.com/2203174
There isn't a direct way to diff gists, so here are the changes I did
if somebody has already read the proposal:

 - The origin check will be an additional step to ensure a valid
   request and not standalone. The conventional checks will still
   exist.

 - Add some issues Luke pointed out about signing and using sessions.

 - Add info about my github fork and branches.

What I will be doing the following week:

 - I haven't done any major contribution to django yet apart from a
   tiny ticket some time ago. So, I'll be working on an ticket next
   few weeks. It is related to filesystem backend of contrib.sessions,
   was raised some time ago:
   https://code.djangoproject.com/ticket/18194

 - Cleanup and organize the proposal a bit more (Probably start
   tracking it as the CSRF protection page -
   https://code.djangoproject.com/wiki/CsrfProtection)

(I have also appended the current proposal below in this post)

--
Rohan


Proposal
--------

#Abstract

Django is a reasonably secure framework. It provides an API and
development patterns which transparently take care of the common web
security issues. But still there are security features which need
attention. I propose to work and improved CSRF checking without any
compromises and on integration of existing work on centralized token
system. If time permits I will also attempt on integration of
django-secure.

#Description
##CSRF Improvements

Cross-Origin Resource Sharing (CORS):  
W3C has a working draft regarding [CORS][w3c-cors-draft], which opens
up the possibility for allowing client-side request cross-origin
requests. This directly triggers in mind the capability to develop
API which can be exposed directly to the web browser. This would let
us get rid of proxies and other hacks used to achieve this.
Currently all the major browsers support this: Chrome (all versions),
Firefox (> 3.0), IE (> 7.0), Safari (> 3.2), Opera (> 12.0). Firefox
and Chrome send the origin header for both AJAX and standard from POST
requests. Introduced it here as some further parts of the post refer
to this.

###Origin checking

With CORS around need for using CSRF token can be dropped, at least in
some browsers. [Ticket #16859][orig-check-ticket], is an attempt for
that. But this was rejected because of neglecting the case for
presence of `CSRF_COOKE_DOMAIN` (Refer to the closing comment on the
ticket for details). So to handle this we need to simulate checking of
CSRF cookie domain as web browsers do it. Maybe:

```python

reqest.META.get('HTTP_ORIGIN').endswith(settings.CSRF_COOKIE_DOMAIN)

```

In case the server receives an origin header in the request, it will
be used for an initial checking and then all the conventional checks
will be done. The general security will automatically be improved with
the increased market share of newer browsers which support Origin
Header.

As the closing comment points it out, we can't do this with secure
requests. They need to be essentially checked against the referrer or
origin, at least for now. We can not be sure that some untrusted or
insecure subdomain has not already set the cookie or cookie domain.
To deal with this, we have to consider https separately as it is
being done now. So it will be something like:

```python
def process_view(self, request, ....):

    # Same initial setup

    if request.method not in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):

        host = request.get_host()
        origin = reqest.META.get('HTTP_ORIGIN', "")
        cookie_domain = settings.CSRF_COOKIE_DOMAIN

        if request.is_secure():
            good_referer = 'https://%s/' % host
            referer = origin or request.META.get('HTTP_REFERER')
            # Do the same origin checks here

        # We are insecure, so care less
        # A better way for this check can be used if needed
        elif origin.endswith(cookie_domain):
            # Safe, continue conventional checking

        # Do the conventional checks here
```

If the above were to be implemented, the setting `CSRF_COOKIE_DOMAIN`
should be deprecated for something like `CSRF_ALLOWED_DOMAIN` which
makes more sense.

###Multiple Allowed Domains (was Better CORS Support)
Since, already introducing Origin checking, we can go one step further
and try to provide better support for CORS for browsers supporting it.
A tuple/list setting, which specifies allowed domains will be
provided. Using this the various access control allowance response
headers will be set when the request origin is from amongst the
allowed domains. For CSRF check, just see if http origin is an allowed
domain.

```python

def set_cors_headers(response, origin):
    response['Access-Control-Allow-Origin'] = origin

def process_response(self, request, response):

    origin = reqest.META.get('HTTP_ORIGIN', "")

    if origin in settings.CSRF_ALLOWED_DOMAINS:
        set_cors_headers(response, origin)

def process_request(self, request, response):

    # Use origin in settings.CSRF_ALLOWED_DOMAINS here instead of
    # origin.endswith

```

Probably, something similar to the above will be needed to incorporate
the CORS support.

###Less restrictive secure requests

The current CSRF system is pretty much secure as it is. But CSRF
protection poses too much restriction to https. It says no to all the
request, without honouring any tokens. It kind of has to, thanks to
the way browsers allow cookie access. A cookie accessible through
subdomains mean that any subdomain secure or insecure can set the CSRF
token, which could be really serious for the site security. To get
around this, currently one has to completely exempt views from CSRF
and may or may not handle CSRF attacks. This can be dangerous. Also if
a person has a set of sites, which talk to each other through clients
and decides to run it over https, it would need some modifications.

Django should behave under https similarly as it does under http
without compromising any security. So, we need to make sure that the
CSRF token is always set by a trusted site. Signing the data with the
same key, probably `settings.SECRET_KEY`, across the sites looks apt
for this, using `django.core.signing`. We can have `get_token` and
`set_token` methods which abstract the signing process.
This can be done in two ways:

 - Store CSRF data in sessions data in case `contrib.sessions` is
   installed. Then the data will automatically be signed with the
   secret key or will not be stored in the client as cookies at all.

 - In case of it being absent from installed apps, revert to custom
   signing

 - Encryption?

```python
from django.core.signing import TimestampSigner

signer = TimestampSigner("csrf-token")
CSRF_COOKIE_MAX_AGE = 60 * 60 * 24 * 7 * 52


def get_unsigned_token(request):
    # BadSignature exception needs to be handled somewhere
    return signer.unsign(request.META.get("CSRF_COOKIE", None)
                         max_age = CSRF_COOKIE_MAX_AGE)

def set_signed_token(response, token):
    response.set_cookie(settings.CSRF_COOKIE_NAME,
                        signer.sign(request.META["CSRF_COOKIE"]),
                        max_age = CSRF_COOKIE_MAX_AGE,
                        domain=settings.CSRF_COOKIE_DOMAIN,
                        path=settings.CSRF_COOKIE_PATH,
                        secure=settings.CSRF_COOKIE_SECURE
                        )


def get_token(request):
    if 'django.contrib.sessions' in settings.INSTALLED_APPS:
        return request.session.csrf_token
    else:
        return get_unsigned_token(request)

def set_token(response, token)
    if 'django.contrib.sessions' in settings.INSTALLED_APPS:
        request.session.csrf_token = token
    else:
        set_signed_token(response, token)

# Comparing to the token in the request
constant_time_compare(request_csrf_token, get_token(csrf_token))

```

Now, doing this is not as simple as the above code block makes it
look. There is a lot which can and probably will go wrong with this
approach:

 - Even when the token is signed, other domains can completely replace
   the CSRF token cookie, it won't grant them access through CSRF
   check though. Even with signing, they just need to replay an
   existing good token/cookie pair, which they can get directly from
   the server any time they want.

 - This sort of couples CSRF with sessions, a contrib app. Currently
   nothing except some of the other contrib apps are tied up with
   sessions. It will break if sessions were to be removed in future or
   the API changed. Also, this means that if one website is using
   sessions CSRF, all of the other must be too. It would actually kind
   of be a step because of the coupling.

 - If this were successfully implemented, is this exposing any
   critical security flaws otherwise? Will it cause compatibility
   issues?

 - Encryption itself comes with its own issues. It will need high
   considerations.

As Paul McMillan said "This is a hard problem", I'll delegate figuring
this to future me. I will look into [The Tangled Web][tangled-web]
and [Google's Browser Security Handbook][gobrowsersec] for ideas,
again suggested by Paul on the IRC.

##Centralized tokenization
There are multiple places in django which use some or other kinds of
tokens:

 - contirb.auth (random password, password reset)
 - formtools
 - session (backends)
 - cache
 - csrf
 - etags

Token generation is pretty common around the framework.  So, instead
of each application having its own token system, and hence needs to be
maintained separately. There should be centralized token system, which
provides an abstract API for everyone to loose. In fact, I have seen
that some apps use `User.objects.make_random_password` from
contrib.auth, which they can be sure of being maintained in the future
for random generation. To me this looks kind of weird.
In last djangocon, a lot of work regarding this was done over [Yarko's
Fork][yarko-fork].

I had a discussion with Yarko Tymciurak regarding this. The work is
nearly ready for a merge, only some tasks left. I can work over these
to insure that the already done significant work gets in django and is
updated for 1.5.

 - Porting more stuff to the new system (README.sec in
   [yarko's fork][yarko-fork])
 - Testing - See if the current coverage of the tests is enough, write
   them if not.
 - Compatibility issues
 - API Documentation

I will study the changes done at djangocon and then attempt the tasks
mentioned above.


##Integrating django-secure
A really useful app for catching security configuration related
mistakes is [carljm's django-secure][djang-secure]. It is specially
useful to find out issues that might have been introduced while quick
changes to settings for development. This project is popular and
useful enough that it can be shipped with django. I haven't been able
give this enough time yet. I can think of two ways of integrating
this:

 - Dropping it as a contrib app  
   This seems pretty straight forward would require minimal amount of
   changes.

 - Distribute around the framework:  
   Like CSRF, this can also be distributed framework wide and hence it
   won't be optional to have. Apps can still define custom checks in
   the same way when `django-secure` was installed as a pluggable
   application.

The app might also need some changes whilst being integrated:

 - More security checks, if required
 - Adjust according to the changes introduced above.

#Plan
I think that the tasks CSRF enhancements and centralized tokenization
will be enough to span through the SoC period. If after a thorough
implementation and testing of these, I still have time, django-secure
integration can be looked into.  


Roughly this proposal can span over a maximum of 5 tasks. Each task
will generally have the following steps:

 a. Initial Research. Design decisions  
 b. Implementation with minor parallel tests.  
 c. Thorough and regression testing to to achieve security quality.  
 d. Configuration/Settings changes and handle compatibility issues.  
 e. Documentation.  

Tasks (with most effort requiring steps in parenthesis):

 1. Origin Checking (b, c)
 2. Multiple Allowed Domains (b, c)
 3. Less restrictive CSRF checking over HTTPS / CORS for HTTPS (a, b)
 4. Unified Tokenization (a,c,e)
 5. Integration of django-secure (d,e)

I'll be using [my fork of django][gh-fork] over github. I'll probably
use the following branch names:
csrf-enhancements (origin checking, multiple request domains etc)
centralized-tokenization (djangocon2011-sec)

##Timeline
Week 1: Task 1.a, 1.b.  
Week 2: Task 1.c, 1.d  
Week 3: Task 2.a, 2.b. Start task 3.a  
Week 4: Task 2.c, 2.d  
Week 5: Task 1.e, 2.e (Doing these together might be beneficial)  
Week 6-7: Complete 3.a. Task 3.b  
Week 7-9: Task 3.c, 3.d  
Week 10: Task 3.e  
Week 11-12: Tasks 4.abcde (max possible)  
Week 13: Complete Task 4 and maybe Max of Task 5  

*I am sorry for writing these as if written by a bot, the deadline was
so close so had to adopt this method*.

[yarko-fork]: https://github.com/yarko/django
[w3c-cors-draft]: http://www.w3.org/TR/access-control/
[orig-check-ticket]: https://code.djangoproject.com/ticket/16859
[tangled-web]: 
http://www.amazon.com/The-Tangled-Web-Securing-Applications/dp/1593273886/
[gobrowsersec]: http://code.google.com/p/browsersec/wiki/Main
[django-secure]: https://github.com/carljm/django-secure
[gh-fork]: https://github.com/crodjer/django

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to