Re: Cluster-specific Adaptors

Eric Yang Mon, 20 Sep 2010 18:52:34 -0700

Having to build state on top of Agent protocol is something that I would
avoid.  I prefer each command to be stateless.  It is better to re-factor
processAddCommand to ensure the simple and clean design can take place.  We
should look into new check_point file format etc.


It's better to do a quick pass on existing adaptors.  Most of this changes
can happen in AbstractAdaptor.  I wouldn't worry about breaking adaptors
because there are very few of them.  I like to make REST API version of
agent protocol standard though.  It is more clear from automation
perspective.

Regards,
Eric


On 9/20/10 5:32 PM, "Ariel Rabkin" <[email protected]> wrote:

> I've been thinking about the adaptor add syntax for a while.  It's
> kinda clunky and hard to extend.
> 
> Here's something I've been considering.  Suppose instead of doing this
> in the add command, we had a separate setTags command that controls
> the tags for the next adaptor[s] started from that conf file or socket
> connection.
> 
> so you'd have stuff like
> settags cluster foo
> add .....
> 
> And then the adaptors wouldn't even have to know about it; you could
> handle this primarily on the framework side.
> 
> This would make the checkpointing somewhat more complex, but I think
> that's potentially not so bad to patch up.  You'd keep a list for each
> adaptor of supplemental commands that should be applied to it, and
> dump those out before checkpointing the adaptor state.
> 
> 
> --Ari
> 
> On Mon, Sep 20, 2010 at 5:15 PM, Bill Graham <[email protected]> wrote:
>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>> me. Plus, it would put the responsibility of parsing tags on each
>> adaptor impl and would require a refactor of how each one currently
>> parses args.
>> 
>> Actually, we might be able to intercept the call to parseArgs in
>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>> to the subclass, which would be none the wiser. Not the cleanest, but
>> at lease not as intrusive on the adaptor implementations.
>> 
>> Ari, also what about the getCurrentStatus() method? I'd think all the
>> impls would somehow need to incorporate tags into that response as
>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>> checkpoints file.
>> 
>> 
>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <[email protected]> wrote:
>>> Hi Bill,
>>> 
>>> This might be hacky but it should be possible to have adaptor specific
>>> params to include tags.  Ari, what do you think?
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 9/20/10 2:58 PM, "Bill Graham" <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> In CHUKWA-515 we discussed the possibility being able to add an
>>> adaptor bound to a given cluster:
>>> 
>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&p
>>> age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_
>>> 12905811
>>> 
>>> I can actually see this being useful, especially now that it's easier
>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>> it doesn't seem like it would be that hard to do, but I want to make
>>> sure I'm not overlooking anything.
>>> 
>>> It seems like we could support this with a few small changes:
>>> 
>>> - Add the concept of tags to the Adaptor interface.
>>> - AbstractAdator would support a getTags method which would return the
>>> union of tags set on the Adaptor and the default tags on the
>>> DataFactory.
>>> - Internal tag implementations on each would change to store tags in
>>> maps, instead of concat'ed strings. This would allow for a "last in
>>> wins" type of functionality so tags could be overriden. This assumes
>>> of course that there should never be more than one of the same tag key
>>> value, which I _assume_ is the case.
>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>> getDefaultTags the data factory.
>>> 
>>> The trickiest part as I see it is figuring out how to change the add
>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>> both makes sense and doesn't break things. In it's current form it
>>> doesn't have room for easy expansion except at the end of the line:
>>> 
>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>>> <initial offset>
>>> 
>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>> the impls of Adaptor.parseArgs either breaking or needing to
>>> change....
>>> 
>>> thanks,
>>> Bill
>>> 
>>> 
>> 
> 
> 
> 
> --
> Ari Rabkin [email protected]
> UC Berkeley Computer Science Department
>

Re: Cluster-specific Adaptors

Reply via email to