My apologies for the length!
Concisely, I intend to provide the Django user with some granular control of the data to be serialized without sacrificing backwards compatibility for old code, or for users who need the straightforward, current functionality. Any serialized Model contains only the data in the source row of the database table. Regarding inheritance, this is only an issue when the model in question derives from one or more concrete models. In that case, however, any data contained on the far end of the one-to-one relation is not serialized at all. The overarching issue is rooted in the wayDjango serializers treat relationships. This is clearly a complex issue, because both shallow and deep serialization are useful in a large range of situations. For the purpose of argument and demonstration, please examine the following use case. A business wishes to begin providing their products online, but they wish to continue using the inventory management software with which they are familiar (henceforth,ezInventory). The models.py looks like: from django.db import models class Product(models.Model): name = models.CharField(max_length=200) description = models.TextField() # We're assuming all prices are in USD... price = models.DecimalField(max_digits=6, decimal_places=2) class Order(models.Model): products = models.ManyToManyField(Product) order_placed = models.DateTimeField() def total_price(self): return self.products.aggregate(models.Sum('price')) Django's serialization facilities make importing and exporting products and orders between their website and ezInventory a breeze. Six months later, however, they have a feature request. They would like to begin providing support for some of their products via subscription. This is an easy addition: # imports... # original models... class Subscription(Product): recurrence = models.PositiveIntegerField(choices=((12, 'annual'), (1, 'monthly'))) Immediately, a problem is evident: the business relies on easy communication of data between ezInventory and the website, but Django serializes the class Subscription as follows: [{ "pk": 13, "model": "app.subscription", "fields": { "recurrence": 12 } }] Now, this is wonderful, as long as we know to pair this information with the product with id 13. However, ezInventory, though supportive of subscriptions in general, merely overwrites the product with id 13 (assuming the products table was imported first) and complains about the missing fields. Skipping merrily back to the real world, here is the issue, plain and simple. Sometimes, you need to include more information than is just in the one model. Sometimes, that's going to appear in the form of inherited models, sometimes via one-to-many or many-to-many relationships, and sometimes just little relevant bits of information that may not actually be in the model at all. There are numerous open tickets about this issue [1][2][3][4]. Clearly, this needs some thought. I like the idea [5] of providing a Serializer class, defined similarly to the ModelAdmin class, to allow custom ways of serializing data, something like: # in serializers.py (or something) class ProductListing(serializers.Serializer): fields = { 'Product': ['price', 'name', 'description'], 'Subscription': ['price', 'name', 'description', 'recurrence'] } serializers.register(ProductListing) $ python manage.py shell >>> from project.app import models >>> from django.core import serializers >>> print serializers.serialize('json', list(models.Product.objects.all()) + ... list(models.Subscription.objects.all()), ... serializer='ProductListing', indent=4) [{ "pk": 1, "model": "app.product", "fields": { "price": 9.98, "name": "Product #1", "description": "Description for Product #1" } }, # other products... { "pk": 13, "model": "app.subscription", "fields": { "price": 14.98, "name": "Subscription #1, Product #13", "description: "My demonstrations aren't particularly creative.", "recurrence": 12 } }] Not only is this backwards-compatible (no new serializers == no new behavior), but it also continues to allow deserialization: The deserialization process would see that app.subscription is a child of product, and fill in the data appropriately. This circumvents one of the largest drawbacks, IMO, ofDjangoFullSerializers [6]; being able to deserialize this data is often just as important as serializing it (lookin ' at you, fixtures). The observant reader will note that, as thus demonstrated, it only solves this issue for models using inheritance, not those using one-to-many or many-to-many fields. So, let's serialize some orders. # in serializers.py (or something) class OrderSerializer(serializers.Serializer): fields = { # include values from the products, and the return value of total_price 'order': [{'products': ['price', 'name']}, 'total_price'] } serializers.register(OrderSerializer) $ python manage.py shell >>> from project.app import models >>> from django.core import serializers >>> print serializers.serialize('json', models.Order.objects.get(pk=1), ... serializer='OrderSerializer', indent=4) [{ "pk": 1, "model": "app.order", "fields": { "products": [{ "pk": 1, "model": "app.product", "fields": { "price": 9.98, "name": "Product #1" } }, { "pk": 13, "model": "app.subscription", "fields": { "price": 14.98, "name": "Subscription #1, Product #13" } }], "total_price": 14.96 } }] As long as enough data is presented, the deserializer stands no less chance of accurately connecting models together than in its current form. Arguments could be provided to search for matching rows and correct the primary keys from serialized data, or to update the fields in the row of given primary key. A couple of notes. I think that providing 'excludes' behavior could be done by prefixing a field name with a minus sign, i.e.: fields = {'order': ['-order_placed']} If the only fields listed are prefixed with a minus sign, we can assume that all remaining fields are to be included. Mixing prefixed and normal fields assumes implicit negations for any fields not mentioned: fields = { 'order': ['-order_placed', 'total_price'] } serializes only the total_price pseudo-field, ignoring both order_placed, explicitly, and products, implicitly. This is just personal preference -- the decision needs to be made, and an 'exclude' member to provide explicit field exclusions is just as sensible. Providing a member to allow shortcut fields definition is a good idea. For example: class OrderSerializer(serializers.Serializer): # similar to QuerySet.select_related('product')... select_related = ['product'] # or, to mirror QuerySet.select_related(depth=1)... select_related = 1 This would use the default behavior, except the 'products' key in the output would be list of the serialized products, which are serialized in the default way. More complex output can be produced by calling one serializer from another: class OrderSerializer(serializers.Serializer): fields = { 'order': [{'products__via': 'ProductsListing'}, 'total_price'] } The 'via' keyword would cause the serializer to use the serialized output of ProductsListing when called with the particular order's products.all(). To be frank, deserialization is made much more complex with the last few examples, and brings us back full circle to the data integrity problems associated withdeserializing relational database data from a flat file. But, in the context of fixtures, it's typical that the database is empty, and so thedeserializer can just dump all the data in and maintain the relations (as is currently done). In situations [2] where the data is prepopulated, and pk values for relationships may not be correct, perhaps a loaddata argument could be specified to tell the deserialize to find the correct pk by searching for a row that matches the serialized fields. For ambiguous cases, it is probably best to use human power instead ofDjango power. Additional issues (some solved above, coincidentally), including the addition of arbitrary fields to serialized data [7], the json serializer's handling of gettext_lazy [8][9], and fields to be ignored by loaddata [10] will also be fixed. NB: Though all of my examples have used JSON, my proposal is of a system that varies independently of serialization format. I believe that it's important to draw a line between format and that data to be formatted, andDjango already provides the proper facilities to allow custom serialization formats. I believe that my proposal can be implemented in the 13-week GSoC time frame as follows. I like to work time-boxed, so everything is split as such. I would rather trim a task or two to be implemented later than have all tasks for a phase leak over into the next week. Assume that every phase ends with documentation writing, regression test creation/updates, and bug hunting. Prior to 05/23 Discuss and build use cases to demonstrate goals, flesh out API, last-minute project scoping 05/23 - 06/05 Code foundational serializers.Serializer base class, where class ProductSerializer(serializers.Serializer): pass would act just as the current serializers.serialize(format, Product.objects.all()), but with the new code structure. 06/06 - 06/12 Implement fields attribute (and excludes attribute, or '- field' functionality) in cases where relations are not followed. Includes child models, as in theProductsListing example class above. 06/13 - 06/26 Implement recursive serialization using the "__via" syntax in fields. Examine whether this format could be used implicitly for deep serialization. 06/27 - 07/04 My extended family will be visiting, so any work this week will reflect that listed for next week. 07/05 - 07/10 Bug hunting, documentation additions, finishing off anything that didn't get finished in an earlier phase, regression testing, write implementations of applicable use cases. 07/11 - 07/24 Provide implicit relation selecting via the fields attribute. 07/25 - 07/31 Add agreed-upon flags to loaddata/dumpdata to allow fixtures users to utilize new features. 08/01 - 08/10 Create documentation patches, bug hunt, improve code quality [http://docs.djangoproject.com/en/dev/internals/contributing/ #coding-style], and begin community testing. 08/11 - 08/17 Communicate with community about unearthed bugs and gotchas, and eliminate or prevent those issues. References: [1] => http://code.djangoproject.com/ticket/4656 [2] => http://code.djangoproject.com/ticket/7052 [3] => http://code.djangoproject.com/ticket/9422 [4] => http://code.djangoproject.com/ticket/10295 [5] => http://code.djangoproject.com/wiki/SummerOfCode2009#Ideas [6] => http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers [7] => http://code.djangoproject.com/ticket/5711 [8] => http://code.djangoproject.com/ticket/5590 [9] => http://docs.djangoproject.com/en/dev/topics/serialization/#id2 [10] => http://code.djangoproject.com/ticket/9279 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---