Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent

Malith Dhanushka Tue, 10 Nov 2015 22:13:24 -0800

On Wed, Nov 11, 2015 at 11:25 AM, Srinath Perera <[email protected]> wrote:


>
>
> On Tue, Nov 10, 2015 at 11:55 AM, Anuruddha Premalal <[email protected]>
> wrote:
>
>> Hi Srinath,
>>
>> Please find my comments inline.
>>
>> On Tue, Nov 10, 2015 at 10:50 AM, Srinath Perera <[email protected]>
>> wrote:
>>
>>> Had a chat with Malith. Have few questions?
>>>
>>>
>>>    1. Why are we using Python? a) if we use different program language,
>>>    we need to discuss it and approve it due to support cost etc ( not as a
>>>    sentence buried in a long mail). Are we going to rewrite data bridge in
>>>    python? that will take lot of time. IMO we should keep it simple and go
>>>    with Java.
>>>
>>> The main reason for using python is the light memory footprint, as
>> opposed to java where it cost a jvm to run the agent. This was discussed at
>> the initial project meetings as well (+Anjana). We don't need to re-write
>> the data-bridge agent in python, Stratos team has already implemented that
>> [1], we can make use of it and add missing features if/as needed. IMO we
>> should keep it simple as well as do it in the proper way, keeping it simple
>> doesn't mean using java.
>>
>> Motivations behind suggesting python for agent implementation are amazon
>> cloud watch[2] and Apache Stratos[3]
>>
>> Regarding support cost of python, have we discussed this already in
>> Stratos case? can we make use of that model (since they have already
>> released)?
>>
>
> As per our chat, talk to Lakmal and get to know about Stratos expereince
> with phython. Although stratos has done bare bone version, you will have to
> implement queing, error handling, async stuff etc in data bridge.
>

For now we will stick to java publisher and will do the python version
separately. So later if a requirement comes we can switch in to python
publisher.


>
>
>>
>>>    1. Regarding JMS, I think we do not need it.
>>>
>>> Do you have any suggestion for distributing configurations over a large
>> agent cluster? or don't we need to consider that use-cases?.
>>
>
> We can implement the same algorithm using point to point connections ( all
> agents send their configs to analytics server using a service call at
> start). I think MVP should do only that.
>
>>
>>>    1. Log formats does not change often. IMO just point to point
>>>    connections should do.
>>>
>>> "Log agent configurations (Log formats), doesn't change often" - we
>> cannot implement a system based on this kind of hypothesis, log agent
>> configurations can get changed, doesn't matter how often that is. IMO it's
>> better to consider that scenario as well.
>>
>
>> Ex : User wants to get the syslog for a certain time of period and then
>> after observing the logs, he decides to disable this log stream. There can
>> be many other use cases, where log configurations can get change.
>>
>
> I would say, for MVP we say if you change the config, restart the agent.
>
>>
>> What do you mean by use of point to point connection? is it use of thrift
>> to distribute configs?
>>
> Thrift or a REST call.
>
>>
>>>    1. Our current analytics model is splitting at the client. I think
>>>    we should start with that. Then, agent first has to send few hundred raw
>>>    lines, what is shown to user and use to configure things. Then actual 
>>> event
>>>    are splitted at the agent.
>>>
>>> Yes
>>
>>>
>>>    1. If log stash log configuration files are well done, can we do the
>>>    same formats?
>>>
>>> Yes,  this has already been discussed in  architecture mail "Component
>> level description of the log analyzer tool"
>>
>> Thanks
>>> Srinath
>>>
>>> p.s. above are opinions only, please shout if disagree.
>>>
>>>
>>>
>>>
>>> On Fri, Nov 6, 2015 at 6:33 PM, Malith Dhanushka <[email protected]>
>>> wrote:
>>> >
>>> > Yes I agree with the complication on applying agent configs in large
>>> clusters. But centralized config management using a message broker is a
>>> critical decision to take as it weighs maintenance effort. That decision
>>> depends on how big the cluster is and how frequently the log configs are
>>> getting changed.
>>> >
>>> > On Fri, Nov 6, 2015 at 3:22 PM, Inosh Goonewardena <[email protected]>
>>> wrote:
>>> >>
>>> >> Hi Anurudda,
>>> >>
>>> >>
>>> >> On Fri, Nov 6, 2015 at 3:06 PM, Anuruddha Premalal <
>>> [email protected]> wrote:
>>> >>>
>>> >>> Hi Inosh,
>>> >>>
>>> >>> Can you be specific on the added complexities of managed
>>> configuration mode? I have explained in the sequence diagram how this will
>>> function. Manage configuration mode is actually a user choice, if the
>>> deployment is quite simple user can use default agent side configurations
>>> (as in logstash).
>>> >>
>>> >>
>>> >> As Malith pointed out, my idea was to avoiding configuring the log
>>> agent remotely and publishing the config. But yes, in a larger cluster,
>>> configuring each of the agent won't be practical and managed config mode is
>>> the better approach. If the user has the choice he/she can select depending
>>> on his/her preference.
>>> >>
>>> >>>
>>> >>>
>>> >>> Managed config mode addresses a major lacking feature which agent
>>> config mode doesn't have; If a user needs to change/ update configs for a
>>> large cluster, configuring them each won't be practical.
>>> >>>
>>> >>> In terms of the overhead concern of splitting an event at the agent
>>> side over master side, since a single log event usually have less amount of
>>> characters, it won't cost much to perform the filtering; if we consider
>>> master side, there won't only be a single log stream so it obviously adds
>>> more overhead to the master. Because of this we shouldn't do filtering
>>> never on master side.
>>> >>>
>>> >>> We are writing the agent using python, which doesn't consume more
>>> resources as a jvm, and it will absolutely be an advantage for a smooth run.
>>> >>>
>>> >>>
>>> >>> On Fri, Nov 6, 2015 at 2:43 PM, Inosh Goonewardena <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> On Fri, Nov 6, 2015 at 1:48 PM, Sachith Withana <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>> Hi Malith,
>>> >>>>>
>>> >>>>> In terms of the 1st option,
>>> >>>>> - the overhead of publishing the whole log line might be an issue,
>>> you are essentially publishing the whole log file split into lines
>>> >>>>> - the overhead on the analyzer would be high too, since it has to
>>> do an extra step of splitting.
>>> >>>>
>>> >>>>
>>> >>>> I too agree on above points, but if we are going with option 2 we
>>> will have carefully analyze the overhead it is adding to the solution.
>>> >>>>
>>> >>>> First of all, managed config mode requires centralized
>>> configuration management where Log Analysis Server has to manage and
>>> process the configs of all log publishing agents and push these configs
>>> back to the agents. Pushing configurations to agents, also has to happen
>>> when there is any update to the configurations as well. IMO, this process
>>> will add some complexity to the overall solution.
>>> >>>>
>>> >>>> On the other hand, we have to analyze the overhead of adding extra
>>> step of splitting to the log agents as well. Since these log agents run in
>>> the servers where productions systems are running, these agents should be
>>> able to function smoothly with minimum amount of resources.
>>> >>>>
>>> >>>>>
>>> >>>>> On Fri, Nov 6, 2015 at 12:30 PM, Malith Dhanushka <[email protected]>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Hi Anuruddha,
>>> >>>>>>
>>> >>>>>> Here the log agent creates the channel between log source and log
>>> analyzer. When it comes to publishing log part, we have two options,
>>> >>>>>>
>>> >>>>>> 1. Log agent publishes raw log line(log event) without splitting
>>> and then log analyzer splits the message and index
>>> >>>>>> - Here we don't need to keep centralized configurations as agent
>>> just simply publish raw log line
>>> >>>>>>
>>> >>>>>> 2. Log analyzer configures the log agent and agent will split the
>>> raw log line and publish to log analyzer then analyzer will do the indexing
>>> >>>>>> - Here we have to keep centralized configurations but less
>>> processing in log analyzer side as it doesn't split raw log lines
>>> >>>>>>
>>> >>>>>> I believe option 1 is simple and cleaner than option 2.
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Malith
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Thu, Nov 5, 2015 at 5:54 PM, Anuruddha Premalal <
>>> [email protected]> wrote:
>>> >>>>>>>
>>> >>>>>>> Hi All,
>>> >>>>>>>
>>> >>>>>>> Below are the suggested  ways of distributing  configurations
>>> among log publishing agents. Appreciate your feedback on this.
>>> >>>>>>>
>>> >>>>>>> There are two modes which a log agent can be configured. User
>>> have to define this mode beforehand, default is the "client based config
>>> mode", there can only be a singe mode  for an agent at a given time.
>>> >>>>>>>
>>> >>>>>>> 1.) client config mode - users will configure the log streams
>>> from client side. (classical logstash way)
>>> >>>>>>>
>>> >>>>>>>    user can define the configurations in agent.conf [1]
>>> >>>>>>>
>>> >>>>>>> 2.) managed config mode - user doesn't have to configure stream
>>> specific configurations at the agent side instead user should define  the
>>> log-groups which it needs to get configured on. managedagent.conf [2]
>>> >>>>>>>
>>> >>>>>>>    This mode is useful for a large cluster of nodes, as user can
>>> perform all the configurations at a central location.
>>> >>>>>>>
>>> >>>>>>> Following sequence diagram shows how the managedconfig mode
>>> behave.
>>> >>>>>>>
>>> >>>>>>> Followings are few possible use-cases explained in Q&A manner.
>>> >>>>>>>
>>> >>>>>>> * What will happen if the user chose to switch the configuration
>>> mode?
>>> >>>>>>>   - This will make the previous configurations obsolete and will
>>> always honor the latest config mode.
>>> >>>>>>>
>>> >>>>>>> * How can we distinguish agents?
>>> >>>>>>> Based on the agentID defined by the users. Users can make use of
>>> instance privateip/publicip to generate unique names, ip will be picked at
>>> the run time and replace the id accordingly (agentid : "esb-${privateip}").
>>> Final agentID will have the following format.
>>> >>>>>>> agentid : "<userdefinedID>-<mastergeneratedID>", this master
>>> generated  ID is used to make sure the uniqueness of the agentID.
>>> >>>>>>>
>>> >>>>>>> * What will happen if the defined agent group is not already
>>> configured?
>>> >>>>>>>  - A new log-group will be created in the master side with empty
>>> configurations. No logs will get published since there's no configurations.
>>> >>>>>>>
>>> >>>>>>> * Is it possible to add/delete log-groups to an agent from the
>>> master side?
>>> >>>>>>>  - yes, once agent registered in master, all the stream specific
>>> configurations can only be done at the master side.
>>> >>>>>>>
>>> >>>>>>> managedagent.conf will get read only once in the agent
>>> life-cycle, once the agent establish a proper connection with master all
>>> the configurations will be handled from there. If the user change the
>>> managedagent.conf and restart, it won't get affected to the existing way
>>> the agent is configured.
>>> >>>>>>>
>>> >>>>>>> Feel free to raise any other use-cases which I have missed here.
>>> >>>>>>>
>>> >>>>>>> [1] agent.conf
>>> >>>>>>> {
>>> >>>>>>>     "agentid": "awsinstance-23",
>>> >>>>>>>     "authid": "sDe334#q2",
>>> >>>>>>>     "authsecret": "defr34w3qq#@Qd",
>>> >>>>>>>     "groups": [
>>> >>>>>>>         {
>>> >>>>>>>             "name": "httpd",
>>> >>>>>>>             "config": {
>>> >>>>>>>                 "input": {
>>> >>>>>>>                     "file": {
>>> >>>>>>>                         "path": "/tmp/access_log",
>>> >>>>>>>                         "start_position": "beginning"
>>> >>>>>>>                     }
>>> >>>>>>>                 },
>>> >>>>>>>                 "filter": {
>>> >>>>>>>                     "date": {
>>> >>>>>>>                         "match": [
>>> >>>>>>>                             "timestamp",
>>> >>>>>>>                             "dd/MMM/yyyy:HH:mm:ss Z"
>>> >>>>>>>                         ]
>>> >>>>>>>                     }
>>> >>>>>>>                 },
>>> >>>>>>>                 "output": {
>>> >>>>>>>                     "loganalyzer": {
>>> >>>>>>>                         "binhosts": "192.168.12.2",
>>> >>>>>>>                         "bindport": 9200
>>> >>>>>>>                     }
>>> >>>>>>>                 }
>>> >>>>>>>             }
>>> >>>>>>>         }
>>> >>>>>>>     ]
>>> >>>>>>> }
>>> >>>>>>>
>>> >>>>>>> [2] managedagent.conf
>>> >>>>>>>
>>> >>>>>>> {
>>> >>>>>>>   "agentid": "awsinstance-23",
>>> >>>>>>>   "authid"    : "sDe334#q2",
>>> >>>>>>>   "authsecret": "defr34w3qq#@Qd",
>>> >>>>>>>   "groups": ["httpd", "esb" ]
>>> >>>>>>> }
>>> >>>>>>>
>>> >>>>>>> Regards,
>>> >>>>>>> --
>>> >>>>>>> Anuruddha Premalal
>>> >>>>>>> Software Eng. | WSO2 Inc.
>>> >>>>>>> Mobile : +94717213122
>>> >>>>>>> Web site : www.anuruddha.org
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Malith Dhanushka
>>> >>>>>> Senior Software Engineer - Data Technologies
>>> >>>>>> WSO2, Inc. : wso2.com
>>> >>>>>> Mobile          : +94 716 506 693
>>> >>>>>>
>>> >>>>>> _______________________________________________
>>> >>>>>> Architecture mailing list
>>> >>>>>> [email protected]
>>> >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Sachith Withana
>>> >>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>> >>>>> E-mail: sachith AT wso2.com
>>> >>>>> M: +94715518127
>>> >>>>> Linked-In: https://lk.linkedin.com/in/sachithwithana
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> Architecture mailing list
>>> >>>>> [email protected]
>>> >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Thanks & Regards,
>>> >>>>
>>> >>>> Inosh Goonewardena
>>> >>>> Associate Technical Lead- WSO2 Inc.
>>> >>>> Mobile: +94779966317
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> Architecture mailing list
>>> >>>> [email protected]
>>> >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Anuruddha Premalal
>>> >>> Software Eng. | WSO2 Inc.
>>> >>> Mobile : +94717213122
>>> >>> Web site : www.anuruddha.org
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> Architecture mailing list
>>> >>> [email protected]
>>> >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thanks & Regards,
>>> >>
>>> >> Inosh Goonewardena
>>> >> Associate Technical Lead- WSO2 Inc.
>>> >> Mobile: +94779966317
>>> >>
>>> >> _______________________________________________
>>> >> Architecture mailing list
>>> >> [email protected]
>>> >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Malith Dhanushka
>>> > Senior Software Engineer - Data Technologies
>>> > WSO2, Inc. : wso2.com
>>> > Mobile          : +94 716 506 693
>>> >
>>> > _______________________________________________
>>> > Architecture mailing list
>>> > [email protected]
>>> > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>> >
>>>
>>>
>>>
>>> --
>>> ============================
>>> Srinath Perera, Ph.D.
>>>    http://people.apache.org/~hemapani/
>>>    http://srinathsview.blogspot.com/
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>> [1]
>> https://github.com/apache/stratos/tree/master/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/modules/databridge
>> [2]
>> http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CWL_GettingStarted.html
>> [3]
>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.x+Python+Cartridge+Agent+Guide
>> --
>> *Anuruddha Premalal*
>> Software Eng. | WSO2 Inc.
>> Mobile : +94717213122
>> Web site : www.anuruddha.org
>>
>>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
Malith Dhanushka
Senior Software Engineer - Data Technologies
*WSO2, Inc. : wso2.com <http://wso2.com/>*
*Mobile*          : +94 716 506 693

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent

Reply via email to