Re: Cluster-specific Adaptors

Ariel Rabkin Tue, 21 Sep 2010 11:55:22 -0700

Why is it bad to keep some state per command session?  It's easy to
know when the state can be discarded -- as soon as we're done reading
the file or when the socket closes.


I think it's fairly intuitive and readable; it's routine for e.g.,
scripts to modify interpreter state.  In straight-line code, which is
all we're ever going to have in the control protocol, this is very
easy to reason about.

--Ari

On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <[email protected]> wrote:
> +1 on staying stateless.
>
> I think the challenge we're facing is that we're trying to support a
> syntax that is simple and readable and can be done with a single line
> (i.e. for the initial_adaptors file, the telnet API, the command line,
> etc), but the configs can potentially be not-so-simple.
>
> For example, here's how you might configure the JMS adaptor which used
> dependency injection. That's a lot for a single line and there's
> nowhere to add new global configs in front of the adaptor specific
> configs without breaking things.
>
> add jms.JMSAdaptor jms-events
> failover:(tcp://jms-host.foo.com:61616,tcp://jms-host.foo.com:61616)
> -q some.queue.name -s "id_type IN ('162')" -x
> org.apache.hadoop.chukwa.datacollection.adaptor.
> jms.JMSMessagePropertyTransformer -p
> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
> 0
>
> What if we were to adopt a few flags into the syntax:
>
> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
> [--adaptor-params <adaptor specific params>|--adaptor-config-file
> <file>]
> <initial offset>
>
> The '--*' flags could be reserved. This would allow us to keep with a
> one-line syntax where that approach works, but allow for expansion.
> Also, if an adaptor config got to complex, those configs could be
> specified in a file if needed.
>
>
> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <[email protected]> wrote:
>> Hi,
>> If I had to implement this, I will add an extra parameter
>> (?extraParams=xyz).
>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>> specific info.
>> I don’t think that we could/should add new complexity in the parsing.
>> The same think should be done for getCurrentStatus(), a public result, that
>> is the same for all adaptors in order to know if the adaptors is working or
>> not and a private section that will give extra information.
>>
>> Also, moving to a json input should simplify everything.
>> /Jerome.
>>
>> On 9/20/10 5:15 PM, "Bill Graham" <[email protected]> wrote:
>>
>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>> me. Plus, it would put the responsibility of parsing tags on each
>> adaptor impl and would require a refactor of how each one currently
>> parses args.
>>
>> Actually, we might be able to intercept the call to parseArgs in
>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>> to the subclass, which would be none the wiser. Not the cleanest, but
>> at lease not as intrusive on the adaptor implementations.
>>
>> Ari, also what about the getCurrentStatus() method? I'd think all the
>> impls would somehow need to incorporate tags into that response as
>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>> checkpoints file.
>>
>>
>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <[email protected]> wrote:
>>> Hi Bill,
>>>
>>> This might be hacky but it should be possible to have adaptor specific
>>> params to include tags.  Ari, what do you think?
>>>
>>> Regards,
>>> Eric
>>>
>>> On 9/20/10 2:58 PM, "Bill Graham" <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> In CHUKWA-515 we discussed the possibility being able to add an
>>> adaptor bound to a given cluster:
>>>
>>>
>>> https://issues.apache.org/jira/browse/CHUKWA-515?focusedCommentId=12905811&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12905811
>>>
>>> I can actually see this being useful, especially now that it's easier
>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>> it doesn't seem like it would be that hard to do, but I want to make
>>> sure I'm not overlooking anything.
>>>
>>> It seems like we could support this with a few small changes:
>>>
>>> - Add the concept of tags to the Adaptor interface.
>>> - AbstractAdator would support a getTags method which would return the
>>> union of tags set on the Adaptor and the default tags on the
>>> DataFactory.
>>> - Internal tag implementations on each would change to store tags in
>>> maps, instead of concat'ed strings. This would allow for a "last in
>>> wins" type of functionality so tags could be overriden. This assumes
>>> of course that there should never be more than one of the same tag key
>>> value, which I _assume_ is the case.
>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>> getDefaultTags the data factory.
>>>
>>> The trickiest part as I see it is figuring out how to change the add
>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>> both makes sense and doesn't break things. In it's current form it
>>> doesn't have room for easy expansion except at the end of the line:
>>>
>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific params>
>>> <initial offset>
>>>
>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>> the impls of Adaptor.parseArgs either breaking or needing to
>>> change....
>>>
>>> thanks,
>>> Bill
>>>
>>>
>>
>>
>>
>



-- 
Ari Rabkin [email protected]
UC Berkeley Computer Science Department

Re: Cluster-specific Adaptors

Reply via email to