I would like to participate in Google Summer of Code this year. I've 
written proposal for "revamping validation functionality" idea. It's 
available as a gist: 
https://gist.github.com/chrismedrela/82cbda8d2a78a280a129 . Below, I'm 
pasting the proposal. Feel free to criticize it.

*Table of content*

1. *Abstract* 1.1 Drawbacks of the existing framework 1.2 Goals 1.3 Benefits
2. *The new framework *2.1 Overview 2.2 Advantages
3. *New features *3.1 django-secure 3.2 django-update
4. *Schedule and milestones*
5. *Who am I?*

*1. Abstract*

*1.1 Drawbacks of the existing framework*

Django currently has a validation framework, but there are a lot of 
problems with it. First of all, it is monolithic and developers cannot 
write custom validation (see [#16905]) or modify the existing functionality 
(see [#12674]); validation lives in a few functions like 
django.core.management.validation.get_validation_errors [1] or 
django.contrib.admin.validation.validate [2]. The validation functionality 
is not separated from other stuff like printing found errors during 
validating models in get_validation_errors or registering models in admin 
app [3] (see [#8579]); it is sometimes done during first call to an 
important method, i. e. CurrentSiteManager [4] is validated in its 
get_queryset [5] method.

[#12674] https://code.djangoproject.com/ticket/12674
[#16905] https://code.djangoproject.com/ticket/16905
[1] 
https://github.com/django/django/blob/master/django/core/management/validation.py#L22
[2] 
https://github.com/django/django/blob/master/django/contrib/admin/validation.py#L14
[3] 
https://github.com/django/django/blob/master/django/contrib/admin/sites.py#L52
[#8579] https://code.djangoproject.com/ticket/8579
[4] 
https://github.com/django/django/blob/master/django/contrib/sites/managers.py#L5
[5] 
https://github.com/django/django/blob/master/django/contrib/sites/managers.py#L38

There are few tests of the validation framework and it is not easily 
testable because validation functions return concatenated error messages 
instead of list of errors (see 
django.tests.invalid_models.invalid_models.models [6]). It also lacks some 
features like warnings (see [#19126]). Due to this disadvantages lots of 
apps do not have any validation, i. e. they do not check inter-app 
dependencies.

[6] 
https://github.com/django/django/blob/master/tests/invalid_models/invalid_models/models.py#L366
[#19126] https://code.djangoproject.com/ticket/19126

*1.2 Goals*

This proposal is about revamping current validation framework. First of 
all, we need to write more tests and rewrite existing ones. Then we need an 
consistent API of validating different kinds of objects like models, 
fields, managers or whole apps so it will be easy to add new kind of 
object. Validation functionality should be separated from other stuff and 
it should minimize dependencies. We should allow developers to add 
validation to their apps and any other kind of objects, so custom 
validation is a must. We will not break backward compatibility.

This proposal is not only about refactoring but also introducing new 
features. Introducing warnings is first feature that allows us to add 
security warnings based on django-secure [7] app. The second main feature 
is making switching to newer version of Django easier by developing 
django-update app which should inspect settings as well as models and 
predict and warn about some problems.

[7] https://github.com/carljm/django-secure

*1.3 Benefits*

There are a lot of benefits. Cleaning code, removing unwanted dependencies 
and adding more tests are the most obvious ones. We will also benefit from 
long term solution which will be easy to maintain since it is extendable. 
Switching Django version will be piece of cake. We will improve security of 
Django projects thanks to security warnings. This also implies that Django 
will be considered as a safe and secure framework. Better opinion is always 
desired.

*2. The new framework*

*2.1 Overview*

The API is based on Honza Kral idea from his patch [8]. An developer can 
add new validation functionality by writing a callable piece of code. It 
will be automatically called during validating whole project (triggered by 
python 
manage.py validate) and it must fulfill the following contract: it has no 
arguments and returns a list of warnings and errors or yields each of them.

[8] https://github.com/HonzaKral/django/compare/master...ticket-12674

The validated object may be a model, a manager, a field or an app. In case 
of a field or a manager, the callable piece of code is a method. In case of 
a model, it is a classmethod. In case of an app, we will put functions in 
validation.py file.

Let's see an example::

class FieldFile(File):
>
    (lots of stuff)
>

>     def validate_upload_to_attribute(self):
>
        if not self.upload_to:
>
            yield Error(obj=self, msg="required 'upload_to' attribute",
>
                explanation="Django need to know the directory where 
> uploaded files should be stored.",
>
                solution="Add \"upload_to = 'path/to/directory/on/server'\" 
> to model %(model)s in file %(model_file)s.")
>

Notice that validation stuff is inherited by child classes. In an uncommon 
case when some validation stuff should be turned off, you can overwrite the 
method with "an empty method" to prevent it from executing. "Private" 
methods starting with _validate are not executed.

validate_upload_to_attribute method and all other validation methods are 
called from validate method which fulfill the same contract as 
validate_*methods and calls all 
validate_* methods by default. By default, validate method is inherited so 
you do not have to write it. In case of an app, you will have to write it 
if you are switching from earlier Django version. If you create a new app 
in Django 1.6, there will be validation.py file created by default and it 
will contain validate as well as validate_models functions.

*Validation chain.*

When a developer types python manage.py validate (or any other command 
which triggers validation) then all apps are loaded and then for each app 
its validate method from validation.py file is called (for backward 
compatibility, if the file is missing, then it is fine -- a default 
validation module, the same as default stub of validation.py, will be 
used). It calls all validate_* functions. One of them named 
validate_modelscalls 
validate classmethod for each model of this app. Then models validate its 
fields and managers. That is the "validation chain".

*Errors and warnings.*

The new framework introduces two new classes called Warning and Error (
ValidationError would collide with django.forms.ValidationError). They are 
really similar, they differ only in their meaning. Their fields are: obj, 
app, msg, explanation and solution. app is the app where the error or 
warning was created; the attribute is not set in the example because it is 
set by default validate function of an app. obj is the invalid object (it 
may be the same as app). Errors connected with particular model (like in 
the example -- the invalid object is a field, but the field will be 
attached to a model) will have additional model attribute thanks to default 
validate classmethod of models.

I think that Django error messages are often confusing and they often 
contain neither solution proposals nor explanation of a problem. I think 
that we should force Django contributors to think about it and separating 
error message (msg), explanation and possible solution list is a way to do 
it. We need to make using Django as simple as possible and suggesting 
solutions and describing a problem in details is a way to do it. It will be 
really important when contributors and developers start to write more 
complex validation like checking inter-app dependencies.

Error messages (as well as solution and explanation fields) can be 
formatted. %(model)s will be replaced with model attribute of the error. 
%(model_file)s will be path to the file where the model exists.

*2.2 Advantages*

First of all, the solution is as simple as possible. There is only one 
concept which developers have to learn -- the contract of validate[_*]methods. 
You do not have to know about 
validate method or validation chain to write your first validation piece of 
code which makes it easier for newbies to play with Django (excluding 
validating an app if you are switching from earlier Django version, but in 
that case you are probably an experienced user).

As you can see, the solution is consistent between different kinds of 
objects and it does not assume that only a fixed set of object types can be 
validated. I believe that good long term solution should be extendable and 
the new framework allows us to easily add validation of new type of object 
-- just modify the validation chain.

One may argue that a developer have to remember when he has to use methods, 
when classmethods and when functions. He may propose using validator class 
(like in Honza Kral's patch) -- a validated object would have to point to 
its validator instance. That would be a progress if it would not cause a 
lot of other problems. For example, if you have field A inheriting from 
field B then you have to remember to inherit that A validator should 
inherit from B validator. It also implies that you have to write a new 
class even though you want to validate only one small thing.

My proposal solves also a lot of existing problems. This solution plays 
well with almost all validation stuff in Django, i. e. existing validation 
of ModelAdmin [9] can be done in admin app (all apps are loaded before 
validation so all models will be registered before validation starts). 
What's more, it is also good solution for a lot of other use cases. The new 
framework allows us to write new kind of apps -- apps containing only 
validation stuff. This is how switching to newer Django version will be 
made easier -- there will be new django-update app which will inspect 
settings file and predict some problems. In the same way new django-secureapp 
will increase security of Django projects.

[9] 
https://github.com/django/django/blob/master/django/contrib/admin/validation.py#L14

*3. New features*

That was first part -- mainly refactoring. The second part is about 
introducing new features like increasing security thanks to django-secureapp 
and making switching Django versions easier thanks to new 
django-update app.

*3.1 django-secure*

This feature will be based on existing django-secure [10] project. The 
project does security checks (like checking if CsrfViewMiddleware is in 
MIDDLEWARE_CLASSES setting). Existing django-secure uses, among other 
things, a middleware to check safety of every request (like checking if 
x-frame is set to DENY). Our validation framework cannot check things like 
that, so the new django-secure app will be only part of the existing one -- 
I will base on only check/csrf.py [11], check/djangosecure.py [12], 
check/sessions.py [13] and tests.py [14] files.

[10] https://github.com/carljm/django-secure
[11] 
https://github.com/carljm/django-secure/blob/master/djangosecure/check/csrf.py
[12] 
https://github.com/carljm/django-secure/blob/master/djangosecure/check/djangosecure.py
[13] 
https://github.com/carljm/django-secure/blob/master/djangosecure/check/sessions.py
[14] 
https://github.com/carljm/django-secure/blob/master/djangosecure/tests.py

The new django-secure will use only warnings (no errors). There will be no 
checksecure command (see [#17101]) -- the security checks will be part of 
validate command.

[#17101] https://code.djangoproject.com/ticket/17101

*3.2 django-update*

django-update will be an entirely new app. It also will use only warnings. 
One of them may be warning that sites framework doesn't exist any longer if 
somebody will try to use it. Another one may check all models if their 
fields are called hour, minute or second which is not allowed now (because 
it clashes with new ORM query lookups [15]). If an model has BooleanField 
without 
default value, we can warn that the default value was changed [16]. I will 
prepare list of all predictable problems before starting coding.

[15] 
https://docs.djangoproject.com/en/dev/releases/1.6/#new-lookups-may-clash-with-model-fields
[16] 
https://docs.djangoproject.com/en/dev/releases/1.6/#booleanfield-no-longer-defaults-to-false

django-update should be heavily tested because it will be used only once, 
just after releasing a new Django version.

*4. Schedule and milestones*

Before starting coding I would like to do some preparation:

   - Writing and discussing full API.
   - List of new tests for validation of fields (mostly checking clashes 
   with ORM querylookups).
   - List of problems which may be automatically predicted during upgrading 
   Django; the list will be based on the changelog [17].
   - List of security issues that can be predicted; the list will be based 
   on the existing django-secure project.

[17] https://docs.djangoproject.com/en/dev/releases/1.6/

As you probably noticed, this proposal contains two parts: "required" one 
which is about creating new validation framework and is mostly refactoring; 
and the second "optional" one which is set of new features: django-secureand 
django-update. That affected the schedule.

*4.1 Milestone 1 (first half) -- new validation framework*

For the first two or three weeks, I will have exams at university so I 
cannot work 8 hours a day. After 6 July (or after 29 June if I pass all 
exams quickly) I will have neither job nor trip.

   - (1 week) *Writing tests for field validation and rewriting the 
   existing ones.* The list of tests will be created before first week.
   - (1 week) *Rewriting field validation* (now it lives mostly in 
   django.core.management.validation [18]). Introducing Warning and 
Errorclasses.
   - (1 week) *Tests and implementation of validation of models and 
   managers + tests.* Rewrite validation of User model of auth app.
   - (1 week) *Tests and implementation of validation of apps.*
   - (1 week) *Rewriting validation of AdminModel in admin app as well as 
   its **tests.* The validation lives in django.contrib.admin.validation[19] 
now.
   - (1 week) *Writing documentation.*
   
[18] 
https://github.com/django/django/blob/master/django/core/management/validation.py
[19] 
https://github.com/django/django/blob/master/django/contrib/admin/validation.py

*4.2 Second half -- new features*


   - *Milestone 2.* (2 weeks) *django-secure*. One week for writing tests 
   and documentation and one for implementation.
   - *Milestone 3.* (3 weeks) *django-update*. Updating list of problems 
   which may be predicted after upgrading Django, tests and documentation -- 
   one half week. Implementation -- one and half week.
   - (1 week) *Buffer week.*
   
*5. Who am I?*

My name is Christopher Mędrela and I am student of University of Science 
and Technology in Krakow (Poland). I program in Python for at least 4 
years. I program also in C and Java. I have started contributing Django 16 
months ago. I have submitted a lot of patches [20] (this is not list of all 
patches). This year I have started working at big [#17093] ticket 
("Refactor django.template to quarantine global state"). It is not 
finished, but it shows that I am able to deal with big tasks.

Some time ago I was working for long time (more than one year) on my own 
project named Scriptcraft [21] that was a programming game, but I suspended 
this project to focus on Django contribution. It also shows that I am not 
lazy and I can push myself even though there is no external motivator. :)

My e-mail is chrismedrela+gsoc at gmail.com. You can find me also at 
#django-dev and #gsoc IRC channels. My nick is chrismed.

[20] 
https://code.djangoproject.com/query?owner=~krzysiumed&status=assigned&status=closed&status=new&or&status=assigned&status=closed&status=new&owner=~chrismedrela&col=id&col=summary&col=status&col=owner&col=type&col=component&order=priority
[#17093] https://code.djangoproject.com/ticket/17093
[21] https://github.com/chrismedrela/scriptcraft

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to