Hi Renato,
When I say mutable, I mean mutable in memory. Ie. in the same sense that
strings are immutable in Java. This is just to prevent aliasing errors in
people's code from creating changes in objects that won't be marked by the
state tracking system I'm working on. Most Avro types are actually
immutable; the only exceptions being records, maps, and lists.
Lets say you have a data schema that looks like this:
{
"type":"record",
"name":"Parent",
"fields":[
{"name": "child",
"type": {"name":"Child","type":"record",
...
}
]
}
So if your mapping in your store provider takes the field "child" and
serializes it and stores it as an immutable blob, thats just how things
will work; you won't look at the fine-grained state tracking information.
If you wanted to use a more sophisticated mapping, where perhaps you use
something like an xpath expression as the key for the "child" field, then
you could do that too and make full use of the fine-grained state tracking.
There's a number of scenarios where this might be desirable. You can, as I
mentioned above, use a more complicated mapping mechanism like xpath
expressions to denote record fields (and nested structure) rather than flat
serialization into a particular column family or qualifier. You could also
use features like column families on some data stores to represent two
entities that are related but that you might want to sometimes access at
the same time and other times not at the same time, so you would use some
of the dirty metadata but not always all of it.
So in summary, the level of sophistication would really depend on the
particular data store and what features the maintainers of that code want
to expose via its mapping.
The dirty metadata itself shouldn't be persisted into the data store; it is
only for keeping track of changes that occur to records to make sure that
you always have enough information in the gora objects to clean up
key-values that might be left in the data store.
On Mon, Jul 30, 2012 at 2:22 PM, Renato Marroquín Mogrovejo <
[email protected]> wrote:
> Hi Ed,
>
> I have a couple of questions w.r.t. I am in the middle of implementing
> the DynamoDB data store for Gora, and there are some severe
> differences in Gora API between disk based data stores and web service
> ones.
> You are proposing to classify fields into:
>
> - Mutable. These ones will have the four states: clean, dirty,
> deleted, and overwritten.
> - Inmutable.
>
> Where most avro based objects will be mutable. How do you think we
> could model Gora Api to deal with web service based data stores (e.g.
> DynamoDB, GAE)? In these cases, the objects we are talking about are
> inmutable objects because they all are primitive objects, and most of
> the transactional methods are handled inside service providers. Do you
> think we should create these attributes as well? Or what kind of
> attributes do inmutable objects should have?
> Thanks in advance!
>
>
> Renato M.
>
> 2012/7/29 Ed Kohlwey <[email protected]>:
> > What I'm talking about is not specific to the Avro store. The issue is
> that
> > state information can be lost during the mutation process. For example,
> one
> > record has another record as a field. In this regard the sub-record
> > represents a map. But deletion state in a record is not tracked; to have
> > enough information to make sure you can go back and delete the kvs in the
> > original store , you need to know what the original value was (depending
> on
> > how the store does mappings) or do a range delete. Maps also do not
> retain
> > enough information to be expressive in this regard; they maintain deleted
> > state but do not describe in a granular fashion the original state of the
> > object.
> >
> > My current thinking is to strictly define four states for fields: clean,
> > which means no mutation is pending for a record; dirty, which means a
> write
> > is pending on a record; deleted, which means that a delete mutation is
> > pending; and overwritten, which is equivalent to dirty and delete. Fields
> > will be strictly separated into two categories: mutable (maps, arrays,
> and
> > records) and immutable (bytes, strings, and other primitives). All
> > non-immutable fields should have the original state of any mutated fields
> > stored either via a tombstone object or dirty bits. Tombstone objects
> will
> > be used to describe the original state of a mutable object that needs to
> be
> > deleted, and dirty bits will be used to signal that the current state of
> > the object is not yet persistent.
> >
> > Sent from my smartphone. Please excuse any typos or shorthand.
> > On Jul 29, 2012 1:49 PM, "Lewis John Mcgibbney" <
> [email protected]>
> > wrote:
> >
> >> Hi Ed,
> >>
> >> Yeah I actually noticed that deletes are not available/supported in
> >> Avro store in trunk and in the your 84 patch. As I'm more or less
> >> coming into the Avro stuff blind... does Avro do deletes or is it just
> >> that we don't yet support in Gora?
> >>
> >> Best
> >> Lewis
> >>
> >> On Sun, Jul 29, 2012 at 6:17 PM, Ed Kohlwey <[email protected]> wrote:
> >> > I've found the apparent semantics of deletes to be pretty inconsistent
> >> > through my work on the Avro port. I don't think enough state
> information
> >> is
> >> > actually stored to implement it reliably. I'm currently working on
> adding
> >> > this on top of my Gora 84 work.
> >> >
> >> > Sent from my smartphone. Please excuse any typos or shorthand.
> >> > On Jul 29, 2012 11:27 AM, "Lewis John Mcgibbney" <
> >> [email protected]>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> What kind of conversation needs to be kicked off here?
> >> >> Currently as it stands deletes (and some other operations) in Gora
> >> >> seem to be shrouded in mystery... :0|
> >> >>
> >> >> Deletes seem to be implemented in gora-accumulo fine, maybe Keith can
> >> >> confirm? Also some of the semantics about what Accumulo expects
> >> >> deletes to be like and whether or not it is working OK for your use
> >> >> case?
> >> >> Ferdy provided important input into GORA-155 stating that there needs
> >> >> to be more clarity w.r.t semantics for versions of operations on a
> >> >> general level before we begin to implement functionality willy nilly
> >> >> at datastore level.
> >> >>
> >> >> Through Hector we can do many alternative delete operations for
> >> >> Cassandra and this is great but I think it is important for us to
> >> >> establish some general rules about what we wish the Gora API to
> >> >> offer/achieve.
> >> >>
> >> >> Any comments?
> >> >> Thanks
> >> >> Lewis
> >> >>
> >> >> --
> >> >> Lewis
> >> >>
> >>
> >>
> >>
> >> --
> >> Lewis
> >>
>