Compilers and data stores

Ed Kohlwey Fri, 24 Aug 2012 05:24:09 -0700

So I just reviewed the Dynamo compiler, and I have a few questions,
followed by a few thoughts.


Questions:

   1. Are annotations the only way to implement the desired features?
   2. What if other data stores have other annotations? Will we create more
   compilers for them?
   3. Renato had mentioned that Gora supports "data services" now
   (presumably in addition to databases). I'm not sure I understand this
   distinction. I have heard Dynamo is a managed database that implements a
   model similar to Cassandra. Can you elaborate on this statement?

Thoughts:

   1. I'm concerned that there is currently some marginal reliance on
   accessing code that is generated by compilers and cannot be declared in a
   supertype. The exact instance of this that I'm aware of is accessing the
   static field _SCHEMA on Avro types generated by the 1.3 compiler via
   reflection. The current preference in the Avro community is to use the name
   SCHEMA$ instead. Issues like this cannot be caught by static compilation
   checks and are real no-no's in my opinion, unless the structure of the API
   is well-documented and enforced by regression tests. If there is a
   proliferation of compilers this problem could become more severe.
   2. Making objects inherit from SpecificRecord (an Avro class) makes them
   convenient to use in RPC's or map/reduce. I think this is one of the most
   attractive features of Gora.
   3. The current mechanism used to track the dirty state of gora-compiled
   objects must be improved 1.7 since the Avro 1.7 API is structured in a way
   that makes the current methodology almost impossible if you engage in any
   degree of code reuse. I believe the following requirements are necessary
   for an improved dirty state tracking system:
   1. The system must be able to represent the original state of the object
      as it was deserialized from the store prior to mutation. The
motivation for
      this is to be able to create the most generalized mapping
support possible.
      Some of this is currently done via the stateful map, but I believe the
      implementation could be improved and generalized. There are lots
of mapping
      schemes that are not currently possible because there is not enough
      information stored in objects to allow erasure of key/values
afterwards. A
      few examples:
         1. Objects of arbitrary structure could be stored with each field
         (including those of child objects) represented as a single
record in HBase,
         Accumulo, or Cassandra.
         2. Child objects could be stored in column families with their
         fields in column qualifiers, reserving one column family for
the fields of
         the parent object. Without storing the state of objects, this
could result
         in values getting "lost" in the database if a union type is used, for
         instance.
         3. Maps of maps
         2. The system should be implemented entirely in the over-the-wire
      protocol that is used to transmit objects
      3. The system will not be represented in the serialized
      representation that the "primary" data store uses since its
representation
      is authoritative.
      4. The improved system should have one representation and access
      pattern in the API (currently both a state tracker object and the
      persistent object itself describe the mutation state).
   4. I'd eventually like to see Avro/Gora objects used as both DTO's and
   DAO's using an Avro javascript implementation (there are two that I am
   aware of). Continued reliance on Avro for serialization on the wire
   supports this.

Compilers and data stores

Reply via email to