You don't need my permission. :) I'm totally fine with doing it later, I just wanted to make sure we were keeping it in mind.
Alan. On Nov 5, 2012, at 10:08 AM, Mithun Radhakrishnan wrote: > Hello, Alan. > > Agreed. I'd like to refactor NotificationListener for that express reason, > but with your permission, I'll do so as a follow-up JIRA, very soon. > > Mithun > > > ________________________________ > From: Alan Gates <[email protected]> > To: [email protected] > Sent: Monday, November 5, 2012 8:27 AM > Subject: Re: Add/delete-partition JMS message format proposal. > > Looks good. I definitely agree with shrinking the message size. We can keep > this to a notification and let client go to the metastore to get the > information it cares about. > > One comment I would make is we should consider that in time we would like to > move this away from just sending messages via JMS to sending them via other > messaging protocols as well (HTTP, Kafka, etc.) So we don't want to do > anything that binds this more tightly to JMS or ActiveMQ. I don't see > anything in these changes that do, but I think it's good to call that out as > a design goal. > > Alan. > > On Oct 30, 2012, at 2:26 PM, Mithun Radhakrishnan wrote: > >> Hello, HCat-Dev. >> >> I'm working on modifying the HCat messages (sent over JMS/ActiveMQ, for >> partition-add/delete) so that clients (such as >> Oozie) would have an easier time with consumption. >> Here are some limitations of what's available currently: >> 1. The present implementation in HCatalog (branch-0.4/) seems to send the >> entire Partition (Java) instance in serialized fashion. Since the >> partition-parameters, hdfs-location etc. are all serialized, the messages >> are rather, emm, garrulous. >> 2. There doesn't seem to be any support for versioning either. So when new >> fields are added, older clients won't work at all without update. >> >> Could we consider transmitting only that info which identifies the >> partitions that pertain to the operation (e.g. partition keys), and drop any >> information that might be gathered from querying the metadata (e.g. storage >> location, partition-parameters, etc.) >> >> We're also considering that the initial implementation encode the ActiveMQ >> payload in JSON. Here's an example of the proposed message format for an >> "add_partition" operation: >> >> "add_partition": { >> "hcat_server" : "thrift://my.hcat.server:9080", >> "hcat_service_principal" : "hcat/[email protected]", >> "db": "default", >> "table": "starling_jobs", >> "partitions": >> [ >> {"grid": "AxoniteBlue", "dt": "2012_10_25"},// Sets of partition-keys. >> {"grid": "AxoniteBlue", "dt": "2012_10_26"}, >> {"grid": "AxoniteBlue", "dt": "2012_10_27"}, >> {"grid": "AxoniteBlue", "dt": "2012_10_28"}, >> ], >> "timestamp": "1351534729" // In this case, interpreted as creation-time. >> } >> >> If we continue to use JMS MapMessages, we could consider having 3 keys in >> the map: >> 1. version = "1" (for the first implementation. Increment as we go.) >> 2. format = "json" (We could consider adding different formats if we choose.) >> 3. message = <the json message body, as above.> >> >> The version and format help a factory choose the right implementation to >> deserialize the message. (A client-side library we supply to Oozie should >> hide this and provide POJOs.) >> >> Since the "partitions" field is an array, and since the values corresponding >> to partition-keys are all strings, we'd be able to accommodate partial >> partitions-specs, or even wild-cards. This might help us add support for >> "mark-set-done" later on. >> >> The first key ("add_partition", "drop_partition" or "alter_partition") >> indicates the operation, and the value indicates the record-body. (At first >> glance, the record-body doesn't change for these operations. But that might >> change, so we'll keep them distinct.) >> >> Also note that HiveMetaStore::add_partitions_core() currently doesn't send 1 >> message for the entire set of partitions being added. Instead we get one >> message per partition. This could be verbose and sub-optimal. We'll tackle >> this sort of thing after we've nailed the format down. >> >> I'm toying with the idea of adding an "other" property, an array of >> key-values to accommodate stuff we hadn't considered, at "run-time" (like if >> we want to introduce a hack). The need for such a property is contingent on >> the behaviour of Jackson w.r.t. newly added properties in the record-body. >> (I'll run experiments and keep you posted.) >> >> What do you think? >> >> Mithun
