So I just reviewed the Dynamo compiler, and I have a few questions,
followed by a few thoughts.
Questions:
1. Are annotations the only way to implement the desired features?
2. What if other data stores have other annotations? Will we create more
compilers for them?
3. Renato had mentioned that Gora supports "data services" now
(presumably in addition to databases). I'm not sure I understand this
distinction. I have heard Dynamo is a managed database that implements a
model similar to Cassandra. Can you elaborate on this statement?
Thoughts:
1. I'm concerned that there is currently some marginal reliance on
accessing code that is generated by compilers and cannot be declared in a
supertype. The exact instance of this that I'm aware of is accessing the
static field _SCHEMA on Avro types generated by the 1.3 compiler via
reflection. The current preference in the Avro community is to use the name
SCHEMA$ instead. Issues like this cannot be caught by static compilation
checks and are real no-no's in my opinion, unless the structure of the API
is well-documented and enforced by regression tests. If there is a
proliferation of compilers this problem could become more severe.
2. Making objects inherit from SpecificRecord (an Avro class) makes them
convenient to use in RPC's or map/reduce. I think this is one of the most
attractive features of Gora.
3. The current mechanism used to track the dirty state of gora-compiled
objects must be improved 1.7 since the Avro 1.7 API is structured in a way
that makes the current methodology almost impossible if you engage in any
degree of code reuse. I believe the following requirements are necessary
for an improved dirty state tracking system:
1. The system must be able to represent the original state of the object
as it was deserialized from the store prior to mutation. The
motivation for
this is to be able to create the most generalized mapping
support possible.
Some of this is currently done via the stateful map, but I believe the
implementation could be improved and generalized. There are lots
of mapping
schemes that are not currently possible because there is not enough
information stored in objects to allow erasure of key/values
afterwards. A
few examples:
1. Objects of arbitrary structure could be stored with each field
(including those of child objects) represented as a single
record in HBase,
Accumulo, or Cassandra.
2. Child objects could be stored in column families with their
fields in column qualifiers, reserving one column family for
the fields of
the parent object. Without storing the state of objects, this
could result
in values getting "lost" in the database if a union type is used, for
instance.
3. Maps of maps
2. The system should be implemented entirely in the over-the-wire
protocol that is used to transmit objects
3. The system will not be represented in the serialized
representation that the "primary" data store uses since its
representation
is authoritative.
4. The improved system should have one representation and access
pattern in the API (currently both a state tracker object and the
persistent object itself describe the mutation state).
4. I'd eventually like to see Avro/Gora objects used as both DTO's and
DAO's using an Avro javascript implementation (there are two that I am
aware of). Continued reliance on Avro for serialization on the wire
supports this.