Hi all, After some discussions with Malcolm on this list and doing some research based on the pointers he gave me I have come up with a rough plan of what I want to do this summer for Django. Since we are running out of time, I have come up with a *rough draft* of the proposal without full discussion with the Django community about the features that can be implemented. So this is in no way a *Complete Proposal* and I don't want to submit until some discussion on this happens really. Also the required proposal format asks to put the links of the devel list discussions that led to the proposal, which I don't have except Malcolm's mails. So I kindly request you all to review my proposal thoroughly and suggest me what I can add or subtract from the proposal. If my propositions and assumptions are true and how I can correct myself, so that I can submit my proposal to Google.
*Note: * Django doesn't serialize inherited Model fields in the Child Model. I asked on IRC why this decision was taken but got no response. I searched the devel list too, but did not get anything on it. I want to add it to my proposal, but before doing it I wanted to know why this decision was taken. Will it be a workable and necessary solution to add that to my proposal? Same is the case for Ticket #10201. Can someone please tell me why microsecond data was dropped? Also I am leaving adding extras option to serializers since a patch for it has already been submitted(Ticket #5711) and looks like a working solution. If you all want something extra to be done there to commit it to django trunk, please tell me, I will work on that a bit and add it to the proposal. Here is my long long long proposal: Title: Restructuring of existing Serialization format and improvisation of APIs ~~~~~~~~~ Abstract ~~~~~~~~~ Greetings! I wish to provide Django, a better support for Serialization by building upon the existing Serialization framework. This project includes extending the format of the Serialized output that existing Serializer produces by allowing in-depth traversal of Relation Fields in a given Model. The project also includes extending the existing API to specify the depth of the relations to be serialized, the name of the related model to be serialized. The API also provides for backwards compatibility to allow older versions of serialized output to work with the to-be introduced changes. All the changes will be made keeping in mind 2 important things. 1. All the changes should be backwards compatible (can only break when a very important requirement that improves the serialization by many folds cannot be implemented without making backwards incompatible changes and django community gives a GO Green signal for doing so). 2. The serialized data should be useful not just for use withing Django apps but also for exporting the data for external use and processing. ~~~~~~~ Why? ~~~~~~~ - The existing format of the serialized output firstly doesn't specify the name of the Primary Key(PK henceforth), which is a problem for fields which are implicitly set as PKs (Ticket #10295). - The existing format only specifies the PK of the related field, but doesn't traverse it in depth to specify its fields (Ticket #4656). - There are no APIs for the above said requirement. - The inherited models fields are not serialized. Situations/problems arising from attempting to fix the above problems - When we allow Serialization to follow relations, it becomes unnatural if the related Model is included in every relating model data. The data becomes extremely redundant. Consider the following example. class Poll2(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') def __unicode__(self): return self.question class Choice2(models.Model): poll = models.ForeignKey(Poll) choice = models.CharField(max_length=200) votes = models.IntegerField() def __unicode__(self): return self.choice The serializing Choice2 Model might look something like below if we allow following-of-Relations: [ { "pk": 1, "model": "testapp.choice2", "fields": { "votes": 1, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Django" } }, { "pk": 2, "model": "testapp.choice2", "fields": { "votes": 2, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Python" } }, { "pk": 3, "model": "testapp.choice2", "fields": { "votes": 4, "poll": [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } } ] "choice": "Others are useless" } } ] which clearly shows the redundant Poll data. Here we are serializing Choice2, of course, but that doesn't mean Serializing Polls will give the natural serialized output. In fact serializing Poll doesn't give anything pertaining to Choice Model instance. A more natural serialization should result from Serializing a Poll Model instance which includes within itself all the Choice Model instances that are related to it. This is an obvious consequence of how Database schemas are designed by applying Normalization rules. - The way loaddata and dumpdata are handled is changed. The new version of this loaddata and dumpdata may not be compatible with the fixtures generated from older versions. Most of the above said problems have been addressed in the tickets specified, but the patches need to be dealt more thoroughly after discussing with the Django community in general. So design decisions need to be taken for fixing most of the tickets(which I will do in community bonding phase). ~~~~~~~ How? ~~~~~~~ The project begins with implementing a version-id field in the serialized output. This field is provided for backwards compatibility. Then it proceeds by converting the existing PK field which appears as { "pk": 1, "model": "testapp.choice2", #... to serialize the name of the PK field. I propose it to be presented as: { "pk": { "id": 1 }, "model": "testapp.choice2", #... This change is being proposed keeping in mind that David Crammer's patches for Ticket #373 gets into Django trunk sometime or the other, since it should happen as it is a long standing requirement. This representation allows for multiple PK fields to exist in the model and be serialized correctly. The corresponding changes in the deserializers to process this data will also be made at this stage. The implementation touches the following parts of Django: django.core.serializers.python.Serializer.end_object() django.core.serializers.xml_serializer.start_serialization() [It already implements version.] and related methods and files. The project proceeds by splitting the serializer into 2 versions to handle the older version and this current version of the serialized output. The decision as to which version of the serializer to use will be taken by adding an API option "old_version=True" parameter to serialize method. The deserialize method can however decide this by looking at the new version-id. Also options for django-admin.py loaddata and dumpdata commands will be provided with --old_version. The second phase, the biggest phase, starts by implementing serializing of relations in depth. The APIs will be implemented for these things hand-in-hand as the features are being implemented. An API to specify, what relations to serialize, will be provided with "relations=(rel1, rel2, ...)" parameter to serialize. Also a parameter to specify "relation_depth=(N1, N2, ...)" will be provided to serialize the related models recursively till the specified depth N. Skipping "relations=" implies to serialize all the related models in a given model and skipping "relation_depth=" implies serializing to full depth. Skipping both serializes just the PK of the related models(old style). Further selection of fields in the individual related models to be serialized is provided with a DjangoFullSerializers like syntax, using dictionaries. An exclude fields option will be given similar to DjangoFullSerializers. Link to DjangoFullSerializers: http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers This phase proceeds by providing the API optional parameter, "reverse_relation=[rel1, rel2]" within a Related Model(Poll2 in the example), rather than the Model that relates to this model(Choice2). This does a reverse relation look up and for each Related Model instance it serializes all the reverse relations that relate to this model instance which solves the above said problem of data redundancy. The output looks something like below if serialized as: serializers.serialize('json', Poll2.objects.all(), reverse_relation=('choice2')) [ { "pk": 1, "model": "testapp.poll2", "fields": { "question": "What's Up?", "pub_date": "2009-03-01 06:00:00" } "testapp.choice2": [ { "pk": 2, "model": "testapp.choice2", "fields": { "votes": 2, "choice": "Python" } }, { "pk": 2, "model": "testapp.choice2", "fields": { "votes": 4, "choice": "Django" } } ] } ] This becomes extremely useful when we are exporting data for external processing. As far as deserializers are concerned, in this case, they process the data to see if they have any other app.modelname in the serialized data outside the fields dictionary, and if they exist are considered as reverse_relation data and constructs both the Poll2 Model objects and Choice Model objects. Calling save() should recursively save all the instances. This implementation may not be as easy as it looks. It requires a lot of design decisions to be taken before implementing these changes. The above said implementation requires making changes to serializers.base.Serializer.serialize method to handle new added parameters. Reverse lookups will be added here. Relation in-depth serialization will also be taken care by possibly adding new methods in the Base class, to return the required data. These methods recursively return data of multi-level relations by possibly "yield"ing. The DeserializedObject.reversed_objects is added to contain a list of reverse relation instances. The <format>.Deserializers will also construct the Model objects by taking into account only the current model fields but not the related model fields. It just uses the PK field from such related model data. The loaddata and dumpdata fixtures will be optionally allowed to use reverse_relations by giving the option --natural. This helps to dump the data with least redundancy for exporting. ~~~~~~~~~~~~~~~~~ Benefits to Django ~~~~~~~~~~~~~~~~~ By the end of this project, Django will have a better support for Serialization. It supports much requested feature of in-depth Serializations thereby fixing ticket #4656. It also fixes #10295. Fixtures and Serialized data become more convenient for use in Django and externally by reducing Data Redundancy. And finally better API support for all the newly introduced features. The serialized data is made more generic keeping in mind the possible future additions like multiple PK support and backwards compatibility. ~~~~~~~~~~~~ Deliverables ~~~~~~~~~~~~ 1. Internal implementation and code for in-depth serialization, reverse relation serialization and additional fields. 2. Additional APIs to support in-depth serialization, to specify relation depth for serialization, support for PK field name in the Serialized output and version id. 3. Also APIs for reverse relations serialization. 4. Additional options to loaddata and dumpdata commands. 5. Test Cases for all the newly introduced features. Non-Code deliverables include testing performed at 3 different phases to verify the correctness and backwards compatibility. Also detailed user and development documentation for using the new Serializer implementations. ~~~~~~~~ When? ~~~~~~~~ The project is planned to be completed in 9 phases. Every phase includes documenting the progress during that phase. The timeline for each of these phases is given below: 1. Design Decisions and Initial preparation(Community Bonding Period : Already started - May 22nd ) Closely working with Django community to learn more about Django in depth, learning code structure of Django, reading documentations related to Django internals, reading and understanding the code base of ORM and Serializers in depth, reading about other system's Serializers. Communicating and discussing with the community about the outstanding issues to resolve the accepted tickets. Design decisions I propose are discussed and finalized. 2. Finalizing Design and Coding Phase I (May 22th – May 31st ) Discussions with Django community in general and my mentor to finalize the design desicions for the major portion of the project. Documenting the design decision. Implementing Version-id, PK changes in Serializers and implementing deserializers to parse the same. Serializers and deserializers will be split to handle both the versions(old and new). 3. Testing Phase I (June 1st – June 5th ) Writing new test cases and adjusting the existing test cases to make sure the phase I changes don't break Django in anyway. 4. Coding Stage II (June 6th – June 21st ) Serializing relations in-Depth will be implemented in this phase, also the corresponding APIs will be added as mentioned in the Details section. Changes and additions will be made to both serializers and deserializers for this. Also corresponding changes are made for fixtures. 5. Testing Phase II (June 22nd – June 29th ) New test cases will be added to ensure Django is still fully backwards compatible and the new features pass the test too. 6. Coding Phase III (June 30th – July 18th ) Reverse relations serialization will be added. Relevant APIs will be implemented. Additions to DeserializedObject and save will be made to contain and save reversed_objects. These will be implemented for fixtures too. Mid Term evaluations happen during this phase. 7. Testing Phase III (July 19th – July 26th ) New test cases will be added for testing reverse relations serialization and backwards compatibility. 8. Requesting for community wide Reviews, testing and evaluation (July 27th – August 2nd ) Final phase of testing of the overall project, obtaining and consolidating the results and evaluation of the results. Requesting community to help me in final testing. 9. Scrubbing Code, Wrap-Up, Documentation (August 3rd – August 10th ) Fixing major and minor bugs if any and merging the project with the Django SVN Trunk. Writing User and Developer documentations and finalization. ~~~~~~~~~~~~~~ Where? ~~~~~~~~~~~~~~ I am already comfortable with the django-devel mailing-list and IRC channel #django-...@freenode.net. I will be able to contact my mentor in both of the above two ways and will also be available through google-talk(jabber). I am also comfortable with svn, git and mercurial since I was the SVN administrator for 2 academic projects and git administrator for 1 project. ~~~~~~~~~~ Why Me? ~~~~~~~~~~ I am a 4th Year undergraduate student pursuing Information Science and Engineering as a major at BMSCE, Bangalore, India(IST). Have been using and advocating Free and Open Source Softwares from past 5 years. Have been one of the main coordinators of BMSLUG. Have given various talks and conducted workshops on FOSS tools: - Most importantly, recently I conducted a Python and *Django* workshop for beginners at NIT, Calicut, a premium Insititution around. - How to contribute to FOSS? - A Hands-On hackathon using GNUSim8085 as example. http://groups.google.com/group/bms-lug/browse_thread/thread/0c9ca2367966727a - Have been actively participating in various FOSS Communities by reporting bugs to communities like Ubuntu, GNOME, RTEMS, KDE. - I was a major contributor and writer of the KDE's first-ever Handbook. http://img518.imageshack.us/img518/9796/hb1o.png http://img518.imageshack.us/img518/4296/hb2.png I have been contributing patches and code to various FOSS communities, major ones being: - GNUSim8085 (http://is.gd/p5wZ , http://is.gd/p5xK) - KDE Step (http://is.gd/oci7) - RTEMS - Melange (The GSoC Web App. http://code.google.com/p/soc/source/browse/trunk/AUTHORS) My Django Work: I was interested in contributing to Django even before GSoC flashed to me. Discussed with David Crammer about Ticket #373 on #django-dev. I read the Django ORM code required for that, but could not write any code myself. Thanks to University coursework. I have had some discussions about fixing ticket #8161 on django-devel list (http://is.gd/obr2) but unfortunately it was fixed. So I am applying for GSoC as I feel it lowers the barrier to get started. http://groups.google.com/group/django-developers/browse_thread/thread/5461dae3cf8d5d6a I have a fair understanding of concepts of Python and have One and half years of Python experience. I have a fair understanding on Django ORM code because of my previous work. I am getting used to Serialization Code as I am writing this proposal and have no problems with it. Also I am using Django from 1 year for some of my Webapps. Since I have been working with FOSS communities I have a good understanding of FOSS Development methodologies of communicating with people, using Ticket tracker of Django, coding and testing. Lastly I want to express my deep commitment for this project and Django. I'm fully available this summer without any other commitments, will tune my day/night rhythm as per my mentor's requirement and assure a dedicated work of 35-40 hours/week. Also I will assure that I will continue my commitments with Django well after GSoC. If you find any part of this proposal is not clear please contact me. ~~~~~~~~~~~~~~~~~~~~~~~~ Important Links and URLs ~~~~~~~~~~~~~~~~~~~~~~~~ My Blog: http://madhusudancs.info My CV : http://www.madhusudancs.info/sites/default/files/madhusudancsCV.pdf -- Thanks and regards, Madhusudan.C.S Blogs at: www.madhusudancs.info Official Email ID: madhusu...@madhusudancs.info --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---