Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent

Srinath Perera Tue, 10 Nov 2015 21:56:36 -0800

On Tue, Nov 10, 2015 at 11:55 AM, Anuruddha Premalal <[email protected]>
wrote:


> Hi Srinath,
>
> Please find my comments inline.
>
> On Tue, Nov 10, 2015 at 10:50 AM, Srinath Perera <[email protected]> wrote:
>
>> Had a chat with Malith. Have few questions?
>>
>>
>>    1. Why are we using Python? a) if we use different program language,
>>    we need to discuss it and approve it due to support cost etc ( not as a
>>    sentence buried in a long mail). Are we going to rewrite data bridge in
>>    python? that will take lot of time. IMO we should keep it simple and go
>>    with Java.
>>
>> The main reason for using python is the light memory footprint, as
> opposed to java where it cost a jvm to run the agent. This was discussed at
> the initial project meetings as well (+Anjana). We don't need to re-write
> the data-bridge agent in python, Stratos team has already implemented that
> [1], we can make use of it and add missing features if/as needed. IMO we
> should keep it simple as well as do it in the proper way, keeping it simple
> doesn't mean using java.
>
> Motivations behind suggesting python for agent implementation are amazon
> cloud watch[2] and Apache Stratos[3]
>
> Regarding support cost of python, have we discussed this already in
> Stratos case? can we make use of that model (since they have already
> released)?
>

As per our chat, talk to Lakmal and get to know about Stratos expereince
with phython. Although stratos has done bare bone version, you will have to
implement queing, error handling, async stuff etc in data bridge.


>
>>    1. Regarding JMS, I think we do not need it.
>>
>> Do you have any suggestion for distributing configurations over a large
> agent cluster? or don't we need to consider that use-cases?.
>

We can implement the same algorithm using point to point connections ( all
agents send their configs to analytics server using a service call at
start). I think MVP should do only that.

>
>>    1. Log formats does not change often. IMO just point to point
>>    connections should do.
>>
>> "Log agent configurations (Log formats), doesn't change often" - we
> cannot implement a system based on this kind of hypothesis, log agent
> configurations can get changed, doesn't matter how often that is. IMO it's
> better to consider that scenario as well.
>

> Ex : User wants to get the syslog for a certain time of period and then
> after observing the logs, he decides to disable this log stream. There can
> be many other use cases, where log configurations can get change.
>

I would say, for MVP we say if you change the config, restart the agent.

>
> What do you mean by use of point to point connection? is it use of thrift
> to distribute configs?
>
Thrift or a REST call.

>
>>    1. Our current analytics model is splitting at the client. I think we
>>    should start with that. Then, agent first has to send few hundred raw
>>    lines, what is shown to user and use to configure things. Then actual 
>> event
>>    are splitted at the agent.
>>
>> Yes
>
>>
>>    1. If log stash log configuration files are well done, can we do the
>>    same formats?
>>
>> Yes,  this has already been discussed in  architecture mail "Component
> level description of the log analyzer tool"
>
> Thanks
>> Srinath
>>
>> p.s. above are opinions only, please shout if disagree.
>>
>>
>>
>>
>> On Fri, Nov 6, 2015 at 6:33 PM, Malith Dhanushka <[email protected]> wrote:
>> >
>> > Yes I agree with the complication on applying agent configs in large
>> clusters. But centralized config management using a message broker is a
>> critical decision to take as it weighs maintenance effort. That decision
>> depends on how big the cluster is and how frequently the log configs are
>> getting changed.
>> >
>> > On Fri, Nov 6, 2015 at 3:22 PM, Inosh Goonewardena <[email protected]>
>> wrote:
>> >>
>> >> Hi Anurudda,
>> >>
>> >>
>> >> On Fri, Nov 6, 2015 at 3:06 PM, Anuruddha Premalal <[email protected]>
>> wrote:
>> >>>
>> >>> Hi Inosh,
>> >>>
>> >>> Can you be specific on the added complexities of managed
>> configuration mode? I have explained in the sequence diagram how this will
>> function. Manage configuration mode is actually a user choice, if the
>> deployment is quite simple user can use default agent side configurations
>> (as in logstash).
>> >>
>> >>
>> >> As Malith pointed out, my idea was to avoiding configuring the log
>> agent remotely and publishing the config. But yes, in a larger cluster,
>> configuring each of the agent won't be practical and managed config mode is
>> the better approach. If the user has the choice he/she can select depending
>> on his/her preference.
>> >>
>> >>>
>> >>>
>> >>> Managed config mode addresses a major lacking feature which agent
>> config mode doesn't have; If a user needs to change/ update configs for a
>> large cluster, configuring them each won't be practical.
>> >>>
>> >>> In terms of the overhead concern of splitting an event at the agent
>> side over master side, since a single log event usually have less amount of
>> characters, it won't cost much to perform the filtering; if we consider
>> master side, there won't only be a single log stream so it obviously adds
>> more overhead to the master. Because of this we shouldn't do filtering
>> never on master side.
>> >>>
>> >>> We are writing the agent using python, which doesn't consume more
>> resources as a jvm, and it will absolutely be an advantage for a smooth run.
>> >>>
>> >>>
>> >>> On Fri, Nov 6, 2015 at 2:43 PM, Inosh Goonewardena <[email protected]>
>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> On Fri, Nov 6, 2015 at 1:48 PM, Sachith Withana <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> Hi Malith,
>> >>>>>
>> >>>>> In terms of the 1st option,
>> >>>>> - the overhead of publishing the whole log line might be an issue,
>> you are essentially publishing the whole log file split into lines
>> >>>>> - the overhead on the analyzer would be high too, since it has to
>> do an extra step of splitting.
>> >>>>
>> >>>>
>> >>>> I too agree on above points, but if we are going with option 2 we
>> will have carefully analyze the overhead it is adding to the solution.
>> >>>>
>> >>>> First of all, managed config mode requires centralized configuration
>> management where Log Analysis Server has to manage and process the configs
>> of all log publishing agents and push these configs back to the agents.
>> Pushing configurations to agents, also has to happen when there is any
>> update to the configurations as well. IMO, this process will add some
>> complexity to the overall solution.
>> >>>>
>> >>>> On the other hand, we have to analyze the overhead of adding extra
>> step of splitting to the log agents as well. Since these log agents run in
>> the servers where productions systems are running, these agents should be
>> able to function smoothly with minimum amount of resources.
>> >>>>
>> >>>>>
>> >>>>> On Fri, Nov 6, 2015 at 12:30 PM, Malith Dhanushka <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>> Hi Anuruddha,
>> >>>>>>
>> >>>>>> Here the log agent creates the channel between log source and log
>> analyzer. When it comes to publishing log part, we have two options,
>> >>>>>>
>> >>>>>> 1. Log agent publishes raw log line(log event) without splitting
>> and then log analyzer splits the message and index
>> >>>>>> - Here we don't need to keep centralized configurations as agent
>> just simply publish raw log line
>> >>>>>>
>> >>>>>> 2. Log analyzer configures the log agent and agent will split the
>> raw log line and publish to log analyzer then analyzer will do the indexing
>> >>>>>> - Here we have to keep centralized configurations but less
>> processing in log analyzer side as it doesn't split raw log lines
>> >>>>>>
>> >>>>>> I believe option 1 is simple and cleaner than option 2.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Malith
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Nov 5, 2015 at 5:54 PM, Anuruddha Premalal <
>> [email protected]> wrote:
>> >>>>>>>
>> >>>>>>> Hi All,
>> >>>>>>>
>> >>>>>>> Below are the suggested  ways of distributing  configurations
>> among log publishing agents. Appreciate your feedback on this.
>> >>>>>>>
>> >>>>>>> There are two modes which a log agent can be configured. User
>> have to define this mode beforehand, default is the "client based config
>> mode", there can only be a singe mode  for an agent at a given time.
>> >>>>>>>
>> >>>>>>> 1.) client config mode - users will configure the log streams
>> from client side. (classical logstash way)
>> >>>>>>>
>> >>>>>>>    user can define the configurations in agent.conf [1]
>> >>>>>>>
>> >>>>>>> 2.) managed config mode - user doesn't have to configure stream
>> specific configurations at the agent side instead user should define  the
>> log-groups which it needs to get configured on. managedagent.conf [2]
>> >>>>>>>
>> >>>>>>>    This mode is useful for a large cluster of nodes, as user can
>> perform all the configurations at a central location.
>> >>>>>>>
>> >>>>>>> Following sequence diagram shows how the managedconfig mode
>> behave.
>> >>>>>>>
>> >>>>>>> Followings are few possible use-cases explained in Q&A manner.
>> >>>>>>>
>> >>>>>>> * What will happen if the user chose to switch the configuration
>> mode?
>> >>>>>>>   - This will make the previous configurations obsolete and will
>> always honor the latest config mode.
>> >>>>>>>
>> >>>>>>> * How can we distinguish agents?
>> >>>>>>> Based on the agentID defined by the users. Users can make use of
>> instance privateip/publicip to generate unique names, ip will be picked at
>> the run time and replace the id accordingly (agentid : "esb-${privateip}").
>> Final agentID will have the following format.
>> >>>>>>> agentid : "<userdefinedID>-<mastergeneratedID>", this master
>> generated  ID is used to make sure the uniqueness of the agentID.
>> >>>>>>>
>> >>>>>>> * What will happen if the defined agent group is not already
>> configured?
>> >>>>>>>  - A new log-group will be created in the master side with empty
>> configurations. No logs will get published since there's no configurations.
>> >>>>>>>
>> >>>>>>> * Is it possible to add/delete log-groups to an agent from the
>> master side?
>> >>>>>>>  - yes, once agent registered in master, all the stream specific
>> configurations can only be done at the master side.
>> >>>>>>>
>> >>>>>>> managedagent.conf will get read only once in the agent
>> life-cycle, once the agent establish a proper connection with master all
>> the configurations will be handled from there. If the user change the
>> managedagent.conf and restart, it won't get affected to the existing way
>> the agent is configured.
>> >>>>>>>
>> >>>>>>> Feel free to raise any other use-cases which I have missed here.
>> >>>>>>>
>> >>>>>>> [1] agent.conf
>> >>>>>>> {
>> >>>>>>>     "agentid": "awsinstance-23",
>> >>>>>>>     "authid": "sDe334#q2",
>> >>>>>>>     "authsecret": "defr34w3qq#@Qd",
>> >>>>>>>     "groups": [
>> >>>>>>>         {
>> >>>>>>>             "name": "httpd",
>> >>>>>>>             "config": {
>> >>>>>>>                 "input": {
>> >>>>>>>                     "file": {
>> >>>>>>>                         "path": "/tmp/access_log",
>> >>>>>>>                         "start_position": "beginning"
>> >>>>>>>                     }
>> >>>>>>>                 },
>> >>>>>>>                 "filter": {
>> >>>>>>>                     "date": {
>> >>>>>>>                         "match": [
>> >>>>>>>                             "timestamp",
>> >>>>>>>                             "dd/MMM/yyyy:HH:mm:ss Z"
>> >>>>>>>                         ]
>> >>>>>>>                     }
>> >>>>>>>                 },
>> >>>>>>>                 "output": {
>> >>>>>>>                     "loganalyzer": {
>> >>>>>>>                         "binhosts": "192.168.12.2",
>> >>>>>>>                         "bindport": 9200
>> >>>>>>>                     }
>> >>>>>>>                 }
>> >>>>>>>             }
>> >>>>>>>         }
>> >>>>>>>     ]
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> [2] managedagent.conf
>> >>>>>>>
>> >>>>>>> {
>> >>>>>>>   "agentid": "awsinstance-23",
>> >>>>>>>   "authid"    : "sDe334#q2",
>> >>>>>>>   "authsecret": "defr34w3qq#@Qd",
>> >>>>>>>   "groups": ["httpd", "esb" ]
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> --
>> >>>>>>> Anuruddha Premalal
>> >>>>>>> Software Eng. | WSO2 Inc.
>> >>>>>>> Mobile : +94717213122
>> >>>>>>> Web site : www.anuruddha.org
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Malith Dhanushka
>> >>>>>> Senior Software Engineer - Data Technologies
>> >>>>>> WSO2, Inc. : wso2.com
>> >>>>>> Mobile          : +94 716 506 693
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> Architecture mailing list
>> >>>>>> [email protected]
>> >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Sachith Withana
>> >>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>> >>>>> E-mail: sachith AT wso2.com
>> >>>>> M: +94715518127
>> >>>>> Linked-In: https://lk.linkedin.com/in/sachithwithana
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Architecture mailing list
>> >>>>> [email protected]
>> >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Thanks & Regards,
>> >>>>
>> >>>> Inosh Goonewardena
>> >>>> Associate Technical Lead- WSO2 Inc.
>> >>>> Mobile: +94779966317
>> >>>>
>> >>>> _______________________________________________
>> >>>> Architecture mailing list
>> >>>> [email protected]
>> >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Anuruddha Premalal
>> >>> Software Eng. | WSO2 Inc.
>> >>> Mobile : +94717213122
>> >>> Web site : www.anuruddha.org
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Architecture mailing list
>> >>> [email protected]
>> >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks & Regards,
>> >>
>> >> Inosh Goonewardena
>> >> Associate Technical Lead- WSO2 Inc.
>> >> Mobile: +94779966317
>> >>
>> >> _______________________________________________
>> >> Architecture mailing list
>> >> [email protected]
>> >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >>
>> >
>> >
>> >
>> > --
>> > Malith Dhanushka
>> > Senior Software Engineer - Data Technologies
>> > WSO2, Inc. : wso2.com
>> > Mobile          : +94 716 506 693
>> >
>> > _______________________________________________
>> > Architecture mailing list
>> > [email protected]
>> > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>> >
>>
>>
>>
>> --
>> ============================
>> Srinath Perera, Ph.D.
>>    http://people.apache.org/~hemapani/
>>    http://srinathsview.blogspot.com/
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
> [1]
> https://github.com/apache/stratos/tree/master/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/modules/databridge
> [2]
> http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CWL_GettingStarted.html
> [3]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.x+Python+Cartridge+Agent+Guide
> --
> *Anuruddha Premalal*
> Software Eng. | WSO2 Inc.
> Mobile : +94717213122
> Web site : www.anuruddha.org
>
>


-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent

Reply via email to