Re: Compilers and data stores

Renato Marroquín Mogrovejo Sat, 25 Aug 2012 00:05:25 -0700

Hi Ed,

Thanks for taking the time to look into this. So my answers are inline.


2012/8/24 Ed Kohlwey <[email protected]>:
> So I just reviewed the Dynamo compiler, and I have a few questions,
> followed by a few thoughts.
>
> Questions:
>
>    1. Are annotations the only way to implement the desired features?

No, they are not the only way to implement the desired features.
Amazon DynamoDB provides the possibility of writing items through a
map of attribute names and an Amazon data type called 'AttributeValue'
[1] which contains the actual values to be stored. As Lewis said we
decided on using DynamoDBMapper class because of the time frame and
because it seemed as a reasonable option at the time. I started
looking into this mapping class because Gora generates classes based
on Avro schemas, and as we don't have Avro schemas for web services we
decided to create DynamoDB annotated classes to persist them. This
helped us on making the code much less convoluted.

>    2. What if other data stores have other annotations? Will we create more
>    compilers for them?

Well ... yeah. The idea would be to refactor Gora's main compiler to
make it more intelligent so it could decide on what the classes are
being compiled into. For example, Google App Engine uses JPA and POJOs
to persist data so an alternative would be to compile the xml mapping
file into fully annotated classes to persist them. While thinking on
your email Ed, I looked for avro rpc libraries and I found this [2]
maybe you are more familiar with this. Do you think that we could use
that to make all of our data stores avro based?

>    3. Renato had mentioned that Gora supports "data services" now
>    (presumably in addition to databases). I'm not sure I understand this
>    distinction. I have heard Dynamo is a managed database that implements a
>    model similar to Cassandra. Can you elaborate on this statement?

Lewis'  answer on this doesn't need me saying no more.

> Thoughts:
>
>    1. I'm concerned that there is currently some marginal reliance on
>    accessing code that is generated by compilers and cannot be declared in a
>    supertype. The exact instance of this that I'm aware of is accessing the
>    static field _SCHEMA on Avro types generated by the 1.3 compiler via
>    reflection. The current preference in the Avro community is to use the name
>    SCHEMA$ instead. Issues like this cannot be caught by static compilation
>    checks and are real no-no's in my opinion, unless the structure of the API
>    is well-documented and enforced by regression tests. If there is a
>    proliferation of compilers this problem could become more severe.

All avro based data stores should share the same compiler, so I
totally agree with you on improving the structure of the API by making
a better documentation and by enforcing regression tests. So I guess
we would have to change the way in which all the data stores manage
their schema. We would have to make this as well for the web based
data stores, so Gora's API remains the same across the data stores.

>    2. Making objects inherit from SpecificRecord (an Avro class) makes them
>    convenient to use in RPC's or map/reduce. I think this is one of the most
>    attractive features of Gora.

True that, but how could we use Avro to write directly to web service
backed database?

>    3. The current mechanism used to track the dirty state of gora-compiled
>    objects must be improved 1.7 since the Avro 1.7 API is structured in a way
>    that makes the current methodology almost impossible if you engage in any
>    degree of code reuse. I believe the following requirements are necessary
>    for an improved dirty state tracking system:
>    1. The system must be able to represent the original state of the object
>       as it was deserialized from the store prior to mutation. The
> motivation for
>       this is to be able to create the most generalized mapping
> support possible.
>       Some of this is currently done via the stateful map, but I believe the
>       implementation could be improved and generalized. There are lots
> of mapping
>       schemes that are not currently possible because there is not enough
>       information stored in objects to allow erasure of key/values
> afterwards. A
>       few examples:
>          1. Objects of arbitrary structure could be stored with each field
>          (including those of child objects) represented as a single
> record in HBase,
>          Accumulo, or Cassandra.
>          2. Child objects could be stored in column families with their
>          fields in column qualifiers, reserving one column family for
> the fields of
>          the parent object. Without storing the state of objects, this
> could result
>          in values getting "lost" in the database if a union type is used, for
>          instance.
>          3. Maps of maps
>          2. The system should be implemented entirely in the over-the-wire
>       protocol that is used to transmit objects

We haven't modelled this functionality on the dynamoDB store because
DynamoDB is managed by a third party.
Just another question here Ed, what do you mean by over-the-wire
protocol? RPC, thrift, etc?

>       3. The system will not be represented in the serialized
>       representation that the "primary" data store uses since its
> representation
>       is authoritative.

Do you mean that we should abstract the serialized representation? but
how would we do for services in which we don't use disk based
serialization?

>       4. The improved system should have one representation and access
>       pattern in the API (currently both a state tracker object and the
>       persistent object itself describe the mutation state).
>    4. I'd eventually like to see Avro/Gora objects used as both DTO's and
>    DAO's using an Avro javascript implementation (there are two that I am
>    aware of). Continued reliance on Avro for serialization on the wire
>    supports this.

Thanks Ed for discussing this. We really need to decide on how Gora's
API should work for both types of data stores.


Renato M.

[1] 
http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/LowLevelJavaItemCRUD.html
[2] https://github.com/phunt/avro-rpc-quickstart

Re: Compilers and data stores

Reply via email to