I've been working on the same issue.  So far it has mostly been just 
researching various options, but I can give you my two cents...

It really depends on your goals and constraints.  I have narrowed down 
to two major families of serialization for storage and networking.  One 
is the JSON/YAML/XML style, where you generate a serialized version of 
data structures primarily based on vectors and hashes that contain only 
simple data types. (Note, JSON is a subset of YAML, so you can parse 
JSON with YAML but not vice versa.)  This is by far the fastest to 
develop and  the most light weight in terms of programmer time.  
Basically one line each for read/write.  The potential hidden cost 
depends on what data structures you use in your program.  If you have 
clearly defined chunks of data to serialize, YAML works  nicely, but for 
more complex structures you often have to do an intermediate conversion 
to simpler data structures where you deal by hand with things like 
circular references and pointers to ephemeral data that you don't want 
serialized.

The previous options are however, inefficient for storage, transmission 
and parsing in comparison to a more strictly defined protocol.  If you 
need raw performance and you are willing to spend the effort defining 
your protocol, then I think something like the Google protocol buffers 
or Facebook thrift are good options.  They are basically the new-school 
versions of CORBA RPC.  In essence, you define a schema for your 
messages or data serialization units, and then some tools generate 
classes or functions that are used to read/write and transmit this 
data.  (SOAP pretty much works the same way, but it idiotically sits on 
XML too, so you get the worst of both worlds...)  Again, if your data 
units to be serialized are self contained this can work pretty smoothly, 
but in more complex structures you will also have to convert between the 
simple, generated classes and your more complex application classes.  
The real work though, is in creating and maintaining your protocol 
definitions and the code that uses the generated classes.

I think the default for a language like clojure should be YAML too.  For 
dynamic languages where developer time is the focus it is by far the 
quickest mechanism to get up and running using databases, configuration 
files, networking, etc.  Maybe we should look into integrating the 
built-in Clojure data-types with a YAML library, or otherwise creating a 
new one, so we can dump and load directly between serialized strings and 
Clojure data structures.

If you run up against the limits of YAML, then I would go protocol 
buffers.  They seem like a clean and efficient way to support 
multi-language communication without wasting time writing a bunch of 
custom serialization methods.  It would be interesting if there was a 
way to sort of generate .proto files by example, by sniffing YAML on the 
wire or something...  It could at least help bootstrap the protocol 
definition phase.

Hopefully that helps.

-Jeff

Tayssir John Gabbour wrote:
> Hi!
>
> How should I approach serialization? I made a little test function
> which serializes and deserializes Clojure objects. It works for
> strings, integers, symbols, LazilyPersistentVectors and.. oddly..
> PersistentHashMaps that have exactly one element. (My Clojure is about
> a month old.)
>
> But for other things, like keywords and most PersistentHashMaps, it
> throws NotSerializableException.
>
> My imagined possible solutions:
>
> * Implement Serializable for Clojure data -- but is it possible in a
>   dynamic "Hey I'll just write a new method!" way?
>
> * Go into Clojure's source and implement Serializable to the Java
>   classes.
>
>
> My end goal is using a nonrelational DB like Tokyo Cabinet or
> BerkeleyDB.
>
> Thanks,
> Tayssir
>
>
> PS: Here's my test code:
>
> (defn my-identity "Copies obj through serialization and
> deserialization."
>   [obj]
>   (let [byte-out (new java.io.ByteArrayOutputStream)
>         obj-out  (new java.io.ObjectOutputStream byte-out)]
>     (try (.writeObject obj-out obj)
>          (finally (.close obj-out)))
>     (let [obj-in  (new java.io.ObjectInputStream
>                        (new java.io.ByteArrayInputStream (.toByteArray
> byte-out)))]
>       (try (.readObject obj-in)
>            (finally (.close obj-in))))))
>
>
>
> >
>   


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to