Hi, everyone!

First: introduction. My name is Uros Trebec and I was lucky enough to
be
selected to implement my idea of "history tracking" in Django. I guess
at least some of you think this is a very nice feature to have in web
framework, so I would like to thank you all who voted for my Summer Of
Code proposal! Thank you!

Ok, to get right to the point: this is a Request For Comment. I would
like to know
what you think about my idea for implementation and how can I make it
better. Here is what I have in mind so far...

(Just for reference: http://zabica.org/~uros/soc/ . Here you can find
my initial project proposal and some diagrams.)


1. PROPOSAL:
Main idea is to create a way to have a content history for every
change in a Model. Current changes tracking is very limited to say the
least, so I will extend/replace that so one could actually see how
something was changed.

1.1 SCOPE:
Changes will have to be made in different parts of Django. Most of
the things should be taken care of inside django.db, except diff-ing
and merging.


USAGE

2. MODELS:
The easiest way to imagine how stuff will work is to have an actual
usage case. So, let's see how Bob would use this feature.

2.1. Basic models:
To enable history tracking Bob has to create a sub-class for those
models that he will like to track:

        class Post(models.Model):
                author = models.CharField(maxlength=100)
                title = models.CharField(maxlength=100)
                content = models.TextField()
                date = models.dateField()

                class History:
                        pass

This works much like using Admin subclass. The difference is that if
the subclass is present then database will have change to include two
tables for this class:

(the main table - not changed):

        CREATE TABLE app_post (
                "id" serial NOT NULL PRIMARY KEY,
                "author" varchar(100) NOT NULL,
                "title" varchar(100) NOT NULL,
                "content" text NOT NULL,
                "date" datestamp NOT NULL
        );


(and the history table):

        CREATE TABLE app_post_history (
                "id" serial NOT NULL PRIMARY KEY,
                "change_date" datestamp NOT NULL,       # required for datetime
revert
                "parent_id" integer NOT NULL REFERENCES app_post (id),
                "author" varchar(100) NOT NULL,         # data from app_post
                "title" varchar(100) NOT NULL,          # data from app_post
                "content" text NOT NULL,                # data from app_post
                "date" datestamp NOT NULL               # data from app_post
        );

I think this would be enough to be able to save "basic full" version of
changed record. "parent_id" is a ForeignKey to app_post.id so Bob can
actually find the saved revision for a record from app_post and when he
selects a record from _history he knows to which record it belongs.


2.2. Selective models:
But what if Bob doesn't want to have every information in history (why
would someone like to keep an incomplete track of a record is beyond
me, but never mind)? Maybe the 'author' and 'date' of a post can't
change, so he
would like to leave that out. But at the same time, he would like to
know who made the change, but does not need the information when
using Post.

Again, this works like Admin subclass when defining which fields to
use:

        class Post(models.Model):
                author = models.CharField(maxlength=100)
                title = models.CharField(maxlength=100)
                content = models.TextField()
                date = models.dateField()

                class History:
                        track = ('title', 'content')
                        additional = {
                                   "changed_by": 
"models.CharField(maxlength=100)
                        }


In this case "app_post_history" would look like this:

        CREATE TABLE app_post_history (
                "id" serial NOT NULL PRIMARY KEY,
                "change_date" datestamp NOT NULL,       # required for datetime
revert
                "parent_id" integer NOT NULL REFERENCES app_post (id),
                "title" varchar(100) NOT NULL,          # data from app_post
                "content" text NOT NULL,                # data from app_post
                "changed_by" varchar(100) NOT NULL      # new field
        );



3. VIEWS
3.1. Listing the change-sets:
Ok, so after a few edits Bob would like to see when and what was
added/changed in one specific record.

A view should probably work something like this:

        from django.history import list_changes

        def show_changes(request, post_id):
                list = list_changes(post_id)
                return render_to_response('app/changes.html', 
{'post_changes_list':
post_changes_list})

And a template:

        <h1>{{ post.title }}</h1>
        <ul>
        {% for change in post_changes_list %}
        <li><div>
                <b>{{ change.id }}</b>
                <h3>{{ change.title }}</h3>
                <p>{{ change.content }}</p>
                <b>{{ change.change_date }}</b>
                </div>
        </li>
        {% endfor %}
        </ul>

So, this lists all changes for a record from "app_post" table. It's
just what a developer would use.

4. MERGING/REVERTING/ROLLBACK
I'm no sure if there is a best way to do this, but I imagine it
should be done something like this:

4.1 Full revert

        object = get_object_or_404(Post, pk=id)
        object2 = object.version(-1)


4.2 Merge only selected "changes"

        object.content = object.content.version(-1)

4.3 Version selection

        object.version(-i)              # go back for "i" versions
        object.version(Datetime d)      # find a version with "d" as
"change_date"
        object.version(object.content)  # find last version in which a
                                        # change was made to "content"
field

The above functions would all return an object of the same type as
"object" if "object.version()" is used. If "object.field.version()" is
used it would return the object corresponding to the "field".

The problem with this is that you don't get a direct access to
"history_table" specific fields, like "change_date" or additional
fields.



IMPLEMENTATION

5. CHANGES
A question that pops up here is how will changes be stored?

The answer is not so straightforward because there's a lot of different
field types. For most of them there is nothing possible rather than a
full copy. But there are some of theme (textField, CharField, ...)
that will work better if we would store just a difference from current
version (the edited one, the one to be saved in the main "app_post")
and the one that was retrieved from the database before it was edited.

For the later it would be wise to use Pythons "difflib" [0] to
calculate
the difference and to merge it back when comparison is needed.

For this one I'm not too sure how it should work.
- When saving, should it retrieve the original version _again_ and then
apply the 'diff' over current and original one? Or should the original
be already available somehow?
- (more questions to come)

The preliminary diagram of my original idea is here:
http://zabica.org/~uros/soc/Soc_django1.png
What do you think?

PS: Current version of this RFC can be found at [1]. And I do have a
category on my blog [2], where I'll post about the progress and such.

[0] http://www.python.org/doc/current/lib/module-difflib.html
[1] http://zabica.org/~uros/soc/rfc.txt
[2] http://zabica.org/uros/category/soc/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to