Re: [GSOC 2012] Customizable serialization

Tom Christie Tue, 03 Apr 2012 06:55:56 -0700

Hi Piotr,

  I'd really like to see something along these lines making it into Django.
I worked on this during the 2011 DjangoCon.eu sprints, which I posted about 
a while 
back<https://groups.google.com/forum/?fromgroups#!searchin/django-developers/customizable$20serialization/django-developers/H2EKZBsRlFY/ZIVyqCS9JZ0J>
.
That work made it into Django REST framework's 
serializer<https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/serializer.py>,
 
but never back into Django,
mostly due to never being fully satisfied with a corresponding 
deserialization API.
(REST framework uses forms for deserialization, but that gets awkward for 
deserializing nested models.)
I've been meaning to follow up on my post for a while now.


As regards your proposal

In general...

  I'm really pleased to see that you've taken on board the previous 
comments and opted for a 2-stage approach - I think it's absolutely the 
right thing to do.
It breaks the problem down into two very well defined tasks.

1. Convert native datatypes to/from data streams.  (eg. Tastypie's 
serializers<https://github.com/toastdriven/django-tastypie/blob/master/tastypie/serializers.py>,
 
REST framework's 
renderers<https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/renderers.py>&
 
parsers<https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/parsers.py>
)
2. Convert complex objects to/from native datatypes. (eg. Tastypie's 
hydrate/dehydate, REST frameworks 
serializer<https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/serializer.py>
, form based 
deserialization<https://github.com/tomchristie/django-rest-framework/blob/master/djangorestframework/resources.py>
)

In your proposal you've understandably addressed model serialization in 
detail, but I (personally) think it'd be worthwhile breaking things down a 
little bit more.
For example, you might consider the following:

1. Define a base API for part (1).  Is it handled by a single class with 
instance methods for converting in each direction, or by 
two separate classes?
    What about formats that are one direction only? (eg. You might want to 
parse HTTP form data, but you're unlikely to want to render it.)
    Are formats that are not fully reversible an issue? (eg. Some formats 
[eg csv?] might not have native representations of all the python 
datatypes, and only have string representations.)
    Exactly what subset of types do we consider 'native'  - should it be 
strictly python native datatypes, or would we use the set of types covered 
by 'django.utils.encoding.is_protected_type'?
2. Define a base API for the components of part (2) that doesn't include 
the implementation or related specifically to Django models/querysets.
    Are the serialization and deserialization handled by the same class? 
 What about serializations that aren't reversible?  [eg only include a 
subset of information]
3. Consider given your general API what interface you'd need for a 
serializer that worked with arbitrary python objects.
4. Given the API for an arbitrary object serializer what else do you need 
to provide in the API to deal with querysets/model instances specifically? 
5. Are object fields handled with a subclass of the Serializer class, or do 
they need a different API?  If they're a subclass what extra information do 
they need?
6. When looking at deserialization is it worth considering mirroring any 
aspects of the form APIs?  How do you treat deserialization errors?
7. Is the deserializer creating saved or unsaved model instances?  How do 
you handle saving the instances, and how do you deal with deserializing 
data where some parts of the object might be implicit?  (Eg deserializing 
data from an API request, where the pk of the model is given in the URL of 
the request?)

If you break it right down like that I think it'd help make sure you get 
the fundamentals right.
I'd expect to see some base classes without implementation, and 
serialization of Django objects tackled purely as a subset of the general 
case.

Some things I was slightly confused by in your proposal as it stands...

* JSONSerializer subclassing Serializer. (and XMLSerializer subclassing 
JSONSerializer.)
  I'd have thought that if you're using a two phase approach you'd keep the 
complex<->native and native<->data stream APIs totally decoupled.
  JSON serialization doesn't itself have anything to do with how you choose 
to structure the serialization of a complex object, so why should it 
subclass the implementation of that part?
* "dehydrate__value__", "hydrate__value__" - what's with the double 
underscore naming?
* I don't get the '@attribute' decorator.  I think it'd be simpler to just 
have 'fields', 'include', 'exclude'.  If fields is None then use the 
'default set of fields' + 'include' - 'exclude'.  If fields is not None, 
use that and ignore include/exclude.
* I wouldn't consider special casing for XML serialization in the 
complex<->native stage.  Sure, yeah, make sure there's an XML 
implementation that can handle the current Django XML serialization 
structure, but anything more than that and you're likely to end up muddying 
the API for a special case of data format.
* 'relation_reserialize' - Why is that needed?
* 'object_name' - It's not obvious to me if that's necessary or not.
* "In what field of serialized input is stored model class name" - What 
about when the class name isn't stored in the serialization data?
* "dehydrate__xxx redefining serialization for type xxx."  I'm not 
convinced about that - it's not very pythonic to rely on type hierarchy in 
preference to duck typing.

Schedule...

I think you're underestimating the importance / time required for 
documentation, regression tests, and merging a new serialization API into 
the existing codebase with backwards compatibility.
I wouldn't be at all surprised to see those tasks taking just as long or 
longer than any new code writing.  (And the second half of your schedule 
looks like all the hardest bits to me)
Personally if I was going to tackle this task I'd strongly consider 
working documentation first - it's getting the API design of the 
(de)serialization right that is the difficult part, not the implementation.

Anyway, I hope those thoughts are helpful.  I'd be very interested in 
seeing how this progresses...

Cheers,

  Tom
 


On Monday, 2 April 2012 17:20:27 UTC+1, Piotr Grabowski wrote:
>
> It's my second approach to customizable serialization. I did some 
> research, find some REST serializers. I focus more on deserialization - 
> it should be easy to provide data is round-trippable. I discard some 
> unnecessary fields and try to improve functionality.
>
>
> --------
> GSOC 2012 Customizable serialization
> -------
>
> Django has a framework for serialization but it is simple tool. The main 
> problem is impossibility to define own serialization structure and no 
> support for related models. Below I present a proposal to improve 
> current framework.
> In my opinion it is not possible to create a framework completely 
> independent of formats that will be user to serialize objects. For 
> instance XML has richer syntax than json (e.g. fields can be tags or 
> attributes) so we must provide functions to handle it which won't be 
> useful in JSON serialization.
>
> -------
> Features to implement:
> -------
>
> Based on presented issues to consider, GSOC proposal from last years and 
> django-developers group threads I prepare a list of features that good 
> solution should have.
>
> 1. Defining the structure of serialized object
> 1.1. Object fields can be at any position in output tree.
> 1.2. Renaming fields
> 1.3. Serializing non-database attributes/properties
> 1.4. Serializing any subset of object fields.
> 2. Defining own fields
> 2.1. Related model fields
> 2.1.1. Serializing foreign keys, m2m and reverse relations
> 2.1.2. Choose depth of serialization
> 2.1.3. Handling natural keys
> 2.1.4. Handling objects serialized before (in other location of output 
> tree)
> 2.1.5. Object of same type can be differently handled depends on location
> 2.2. Other fields - custom serialization (e.g. only date in datetime 
> fields)
> 3. One definition can support multiple serialization formats (XML, JSON, 
> YAML).
> 4. Backward compatible
> 5. Solution should be simple. Easy to write own serialization scheme.
>
> Below I have tags like (F2.1.2) - means support for feature 2.1.2.
>
> ------
> Concept:
> ------
>
> Make the easy things easy, and the hard things possible.
>
> In my proposal I was inspired by Django Forms and django-tastypie. 
> Tastypie is great API framework for Django.
> Output structure will be defined declarative using classes. For sure 
> there is needed class for model definition. In my solution I define also 
> model fields with classes. It's the simplest way to provide free output 
> structure.
> There should be two phases of serialization. In first phase Django 
> objects like Models or Querysets will be write as native Python types 
> (F3) and then in second phase it will be serialized to chooses format.
>
> Suppose we want to serialize this model:
>
> class Comment(Model):
> user = ForeignKey(Profile)
> photo = ForeignKey(Photo)
> topic = CharField()
> content = CharField()
> created_at = DateTimeField()
> ip_address = IPAddressField()
>
>
> class User(Model):
> fname = CharField()
> lname = CharField()
>
>
> class Photo(Model):
> sender = ForeignKey(User)
> image = ImageField()
>
>
> Below we have definition of serializer classes CommentSerializer.
>
> If we want to serialize comment queryset:
> serializers.serialize('json|xml|yaml', queryset, 
> serializer=CommentSerializer, **options)
> If 'serializer' isn't provided we have defaults serializer for each 
> format (F3)
>
>
> class CommentSerializer(ModelSerializer):
> content = ContentField()
> topic = TopicField(attribute=True)
> photo = ForeignKey(serializer=PhotoSerializer)
> y = YField() #(F1.1.3)
>
> def dehydrate__datetime(self, obj): #(F2.2)
> return smart_unicode(obj.date())
>
> def hydrate__date(self, obj): #(F2.2)
> return smart_unicode(datetime.combine(obj, datetime.time.now()))
>
> class Meta:
> aliases = {'topic' : 'subject'}
> #fields = (,)
> exclude = ('ip_address',)
> relation_reserialize = FlatSerializer
> field_serializer = FieldSerializer
> # subclass of ModelSerializer or FieldSerializer
> relation_serializer = 
> FlatSerializer|ModelSerializer|NaturalModelSerializer|MyModelSerializer
> object_name = "my_obj"
> model_name = "model"
>
>
> ModelSerializer has definition of fields, methods and Meta class. 
> Default each field is serialized by Meta.field_serializer or 
> Meta.relation_serializer. ModelSerializer fields redefining this 
> behavior. ModelSerializer methods dehydrate__xxx redefining 
> serialization for type xxx, and hydrate__xxx is for deserialization.
>
> ModelSerializer methods returns native Python types
> I will explain ModelSerializer fields later
>
> Meta Class
> a) aliases - redefine field name: topic : "..." => subject : "...". Can 
> do 'topic' : '' - return of topic method is one level higher. There is 
> metatag __fields__ - rename all fields. If more than one field has same 
> name list is created #(F1.2)
> b) fields - fields to serialize #(F1.4)
> c) exclude - fields to not serialize #(F1.4)
> g) relation_reserialize - using what Serializer if object was serialized 
> before(F2.1.4)
> h) field_serializer - default field serializer
> h) relation_serializer - default relation (ForeingKey or ManyToMany) 
> serializer. There are some build-in posibilities: (2.1)
> * FlatSerialzer - only primary key - default
> * ModelSerializer - Predefined serializer for related models. One level 
> depth, all fields.
> * NaturalModelSerializer - like flat but serialize natural keys
> * Custom Model Serializer
> If someone want serialize also intermediate model in M2M he should wrote 
> custom field
>
> i) object_name - if it isn't empty returns <object_name_value>serialized 
> object</object_name_value> else return serialized object. Useful with 
> nested serialization. Default object_name is empty. In root level if 
> object_name is empty then "object" is default
> j) In what field of serialized input is stored model class name
>
> ModelSerializer fields are responsible for serialization model fields
> In serialization value of model field will be passed to some methods of 
> ModelSerializer field class and it should be able to return Python 
> native types. Field should be able also to deserialize model field value 
> from input.
>
> If there is some ModelSerializer class field and none field of that name 
> in model it should be treated as custom field.
>
> class ContentField(FieldSerializer):
>
> def hydrate__value__(self, field):
> return field.lower()
>
> def dehydrate__value__(self, field):
> return field.upper()
>
>
> class YField(FieldSerializer):
>
> def dehydrate__value__(self, field):
> return 5
>
>
> class TopicField(FieldSerializer):
>
> @attribute
> def dehydrate__lower_topic(self, field):
> return field.lower()
>
> def __name__(self, field):
> return "value"
>
> def dehydrate__value__(self, field):
> return field
>
>
> Each method represent field in serialized output. They can return python 
> native types, other Fields or ModelSerializers, list or dict.
>
> Field serializer has two special methods __name__ and value__. value__ 
> is the primary value returned by field. Each method except __name__ 
> should be preceded by dehydrate or hydrate. First is used in 
> serialization, second in deserialization.
>
> E.g.
> In some model class (topic="Django")
> topic = TopicField()
>
> class TopicField(Field):
> def dehydrate__value__(self, field):
> return field
>
> xml: <topic>Django</topic>
> json "topic" : "Django"
>
> But what if we want to add come custom attribute (like lower_topic above).
> xml: <topic><lower_topic>django</lower_topic>Django</topic> - far i know 
> it's correct but it's what we want?
> json topic : {lower_topic : django, ??? : Django}
> We have __name__ to provide some name for field:
>
> class TopicField(Field):
> def __name__(self, field):
> return "value"
> def dehydrate__value__(self, field):
> return field
>
> xml: <topic><lower_topic>django</lower_topic><value>Django</value></topic>
> json topic : {lower_topic : django, value : Django}
>
> Like I say before, there are two phases of serialization.
> First phase present Django models as native Python types.
> At beginig each object to serialize is passed to ModelSerializer and to 
> each Field. Everything will be resolve to Python native types.
>
> Rules for resolving:
> 1. In Serializer class:
> * ModelSerializer class => {}
> * If ModelSerializer class has object_name => __object_name__ : 
> Meta.object_name
> * Fields in Serializer class => aliases[field_name] : field_value
> * If aliases[x] == aliases[y] => aliases[x] : [x_value, y_value]
> * If x=Field(attribute=True) => __attributes__ : {x : x_value} Fail if 
> x_value can't be attribute value
>
>
> 2. In Field class:
> * If only value__ => dehydrated__value__
> * If other methods presents => {mehod_name : method_value, ...}
> * If __name__ => { __name__ : dehydrated__value__ }
> * If method decorated @attribute => __attributes__ : {method_name : 
> method_value} Fail if method_value can't be attribute value
>
>
> After that we have something like dict of dicts or lists. Next we must 
> append ModelSerializer dehydrate__type rules to output.
> In dict there is special key __attribute__ contains dict of attributes 
> for xml
> In this stage we must decide format to serialize. If it's not XML 
> __attribute__ must be joined to rest of the dict.
>
> In second phase we have Python native types so we can serialized it with 
> some module like simplejson.dumps(our_output)
>
> Deserialization:
> It's also two phases process. In first phase we deserialize input to 
> Python native types (same as return in second phase of serialization), 
> and in second create Model objects. First phase should be simple. Second 
> is a lot harder.
> First problem is what type of object is in serialized input. There are 
> two way to find it. You can pass Model class as argument to 
> serialization.serialize or specify in Meta.model_name what field 
> contains information about type.
> Next all fields in Serializer should be matched with input. 
> 'hydrate__value__' and other hydrate methods are used to fill fields in 
> model object.
>
>
> -----
> Prove of concept
> -----
>
> class PKField(Field):
>
> def dehydrate__value__(self, field):
> return smart_unicode(self.instance._get_pk_val(), strings_only=True)
>
> def hydrate__value__(self, field):
> self.instance.set_pk_val(field)
>
>
> class ModelField(Field):
> def dehydrate__value__(self, field):
> return smart_unicode(obj._meta)
>
> #no need of hydrate__value__
>
>
> class JSONSerializer(ModelSerializer):
> pk = PKField(attribute=True)
> model = ModelField(attribute=True)
>
> class Meta:
> aliases = {'__fields__' : 'fields'}
> relation_serializer = FlatSerializer
>
>
> class XMLSerializer(JSONSerializer):
> class Meta:
> aliases = {'__fields__' : 'field'}
> default_field_serializer = XMLFieldSerializer
> default_relation_serializer = XMLFlatRelationSerializer
>
>
> XMLFieldSerializer(Field):
>
> @attribute
> def name(self, name, obj):
> ...
>
> @attribute
> def type(self, name, obj):
> ...
>
>
> XMLFlatRelationSerializer(Field):
>
> @attribute
> def to
> ...
>
> @attribute
> def name
> ...
>
> @attribute
> def rel
> ...
>
> -----
> Shedule
> -----
> I want to work approximately 20 hours per week. 15 hours writting code 
> and rest for tests and documentation
>
> Before start: Discussion on API design, I hope everything should be 
> clear before I start writting code.
> Week 1-2: Developing base code for Serializer.
> Week 3-4: Developing first phase of serialization.
> Week 5: Developing second phase of deserialization.
> Week 6: Developing second phase of serialization and first of 
> deserialization
> It's time for mid-term evaluation. I will have working Serializer except 
> nested relations.
> Week 7-8: Handling nested ForeignKeys and M2M fields.
> Week 9: Developing old serialization in new api with backward compatibility
> Week 10: Regression tests, writing documentation
> Week 11-12: Buffer weeks
>
>
> -----
> About
> -----
> My name is Piotr Grabowski. I'm last year student at the Institute of 
> Computer Science University of Wrocław (Poland). I've been working with 
> Django for 2 years. Python is my preffered programing language but I 
> have been using also Ruby(&Rails) and JavaScript.
>
> --
> Piotr Grabowski
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/qtDI5hCLhSAJ.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: [GSOC 2012] Customizable serialization

Reply via email to