Re: CSV serializer?

Russell Keith-Magee Tue, 26 Oct 2010 17:30:02 -0700

On Wed, Oct 27, 2010 at 3:57 AM, Richard Laager <[email protected]> wrote:
> We have a CSV view (not a serializer) that is linked from every
> change_list page. This allows sufficiently privileged users to dump the
> database table into Excel to do things not covered by our existing
> views. We do not allow for a CSV import, but it's been something that
> we've wanted.
>
> We'd be very interested in this project.
>
> On Tue, 2010-10-26 at 23:05 +0800, Russell Keith-Magee wrote:
>> CSV has a basic
>> structure (i.e., comma separated values), but doesn't have a natural
>> way of representing multiple datatypes
>
> I think this is the biggest challenge. Could we come up with some
> criteria that would have to be met for a given field type's
> representation?
>
> Perhaps:
> 1) The field has to load correctly in Excel.
> 2) The field has to load correctly in OpenOffice.org.
> 3) The field has to be human readable, except where doing so would
>   violate #1 or #2.
> 4) The field should match its most common SQL representation, except
>   where doing so would violate #1, #2, or #3.
>
> Handling foreign keys is problematic. If you just export the key, you
> often end up with an integer that's meaningless. If you export the
> related object, do you use it's __unicode__ or something else?


A natural key would be the obvious 'something else' here. Using the
key value would be the approach most consistent with the other
built-in serializers, with natural keys as a fallback under certain
conditions.

> On import, do you match the provided values to existing values in the
> JOIN table or can new ones be added?
>
> To be honest, I haven't looked at the JSON serializer, so I'm not sure
> how this is handled there. Of course, JSON would support nested objects
> where CSV wouldn't.

And yet, the JSON serializer doesn't *ever* nest objects. That's a
long standing feature request, and yet another reason why I'm
interested in a fully customizable serialization framework.

>> multiple values for a single field
>
> When would this matter? Is there a field type in Django that uses SQL
> arrays? If not, SQL has the same issue.

Depends how you serialize m2m fields and natural keys.

m2m fields can be 'faked' by serializing the m2m model as a collection
of columns. This makes CSV serialization different to other
serializers. This isn't a show-stopper, it's just worthy of note.

Natural keys are a different story. You either need to expand a single
column into multiple columns, or you need to have a parseable value in
a single column.

>> or differentiating NULL from empty string
>
> Neither does CharField, so why does this matter?

There isn't a different for form entry (because there's no way for a
user to form-submit "null", but there is a difference in the database,
which is what the serializer is dealing with. The Oracle backend
doesn't make the distinction between an empty string and NULL, but
that's a peculiarity of Oracle.

>> Even in-file
>> metadata (sometimes represented as the first, commented out row of a
>> CSV file) is the subject of inconsistency.
>
> On export, you either have it or you don't. It seems that having a
> header row is better than not, so include it. This meets your "useful in
> Excel" criteria.
>
> As far as import, it's easy to strip the first row or not, but the big
> question is if you want to make it *optional*.
>
> If the goal of the serializer is to import data that you've previously
> export, then there's no need to make it optional. If you want something
> more generally useful, you'll have to look at the first row and try to
> match the columns to field names. If they all match, then it's a header
> row, if they don't, it's not.

You've highlighted a bunch of design decisions that any single CSV
serializer would need to make. The thing is, all the design issues
you've highlighted here could go one way or the other, and the 'right'
decision will vary wildly with circumstance.

This is why I would rather see a general serialization framework,
rather than a single CSV serializer. That way, rather than introducing
a single serialization format, we enable end-users to output whatever
format they want. If a particular user wants to serialize foreign keys
using some exotic technique with local significance, they can subclass
a base serializer make a simple modification rather than needing to
duplicate an entire serialization backend.

There are many existing tickets that are seeking to expand on the
capabilities of the serialization framework; rather than implement a
bunch of complex changes, I'd rather set up the framework that makes
all those changes simple to implement, and more. Providing *a* CSV
serializer may provide utility to *some* users. Providing a framework
that allows for arbitrary serialization in *any* format, with *any*
set of serialization conventions has the potential to be much more
useful.

Providing a base CSV serializer may be an important part of this
change -- after all, CSV a bit of a pathological serialization case
when compared to nested formats like XML and JSON -- and demonstrating
that a framework is capable of describing and CSV would be an
important part of demonstrating the flexibility of any such framework.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: CSV serializer?

Reply via email to