Hi Gihan, On Wed, Dec 2, 2015 at 5:00 PM, Gihan Anuruddha <[email protected]> wrote:
> So if I send 3 consecutive events with different arbitrary fields, does > this schema update 3 times consecutively? > yes > How often server get events that have a different arbitrary map? > At most cases, setSchema will happen only at the first log event of a new log stream. 'cause all the following events will have the same keyset. However there's a case in logstash where, if a particular value is empty corresponding key is not sent with the event, because of this we have to check and update schema for each and every event. > Can we expect each event have a different arbitrary map situations? > > Haven't came across such a scenario yet. Regards, > Gihan > > On Wed, Dec 2, 2015 at 4:53 PM, Malith Dhanushka <[email protected]> wrote: > >> >> >> On Wed, Dec 2, 2015 at 4:47 PM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi Malith, >>> >>> On Wed, Dec 2, 2015 at 4:41 PM, Malith Dhanushka <[email protected]> >>> wrote: >>> >>>> Hi Folks, >>>> >>>> We had an offline chat about this. >>>> >>>> Since indexing all the arbitrary fields is not feasible with the >>>> current architecture, requirement of indexing arbitrary fields in log >>>> analyzer will be handled in Log analyzer REST API. Idea is to compare the >>>> incoming event with existing schema which is kept in in-memory and if there >>>> is a change then to update the table schema. >>>> >>> >>> In this case, all the fields are going to be indexed? Is there any way >>> with this solution to say I need specific fields (say x, y, z) to be >>> indexed in the log event and not all the fields? >>> >> >> No. In this way client wont send the table schema before hand. Up on the >> change of an event , REST API will dynamically update the schema. Since >> this is log analyzer specific scenario , all the events needs to be >> indexed. >> >> Thanks >> >>> >>> Thanks, >>> Sinthuja. >>> >>>> >>>> Overriding table schema will make event sink configuration inconsistent >>>> with table schema. To avoid that event sink feature needs to be improved in >>>> order to support merging table schemas. For that event persist feature >>>> should have a flag to enable/disable merging table schemas. >>>> >>>> Thanks, >>>> >>>> On Wed, Dec 2, 2015 at 1:30 PM, Sinthuja Ragendran <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> On Wed, Dec 2, 2015 at 11:05 AM, Anjana Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> On Wed, Dec 2, 2015 at 10:17 AM, Sachith Withana <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Now that we are using logstash out of the box, without the >>>>>>> DASConnector, it won't do that. >>>>>>> >>>>>>> The logstash would just start publishing and with the current >>>>>>> design, AFAIK the schema setting would be handled by the LAS server, >>>>>>> >>>>>> >>>>>> Oh yeah, I see .. >>>>>> >>>>>> >>>>>>> >>>>>>> BTW for that requirement, can we provide a way to allow indexing all >>>>>>> the columns? >>>>>>> >>>>>> >>>>>> Well .. we can .. I guess this is the same that Malith request in the >>>>>> first mail. Only thing is, we have to change the internals/architecture >>>>>> of >>>>>> how we do indexing currently, the current logic is, we check the input >>>>>> value against the table schema, and do the required indexing. For >>>>>> example, >>>>>> if facets are defined, data types etc.. so if we are just saying, to >>>>>> index >>>>>> all fields, it will be a new path there, and also we have to introduce a >>>>>> new special flag for a table to say, index all. Also, we should need some >>>>>> mechanism of figuring out the fields of a specific log type in the >>>>>> server, >>>>>> where at least with the table schema, we knew what are all the fields >>>>>> that's there for all the log types. Ideally, we need to store some >>>>>> metadata >>>>>> somewhere saying, for this specific log type, these are the fields and so >>>>>> on. Do we get some kind of a log category/type information with the >>>>>> standard logstash HTTP connector? .. any other schema setting, storing of >>>>>> metadata can be done in the server side, and we can cache it in-memory to >>>>>> do fast lookups and modifications of the schema (together with some >>>>>> cluster >>>>>> messaging to keep it in-sync with other nodes). >>>>>> >>>>>> Or else, maybe we are again back to writing our own logstash adapter >>>>>> which will make the whole thing much simpler? .. >>>>>> >>>>> >>>>> Yeah +1 , actually I was also thinking having our own logstash adaptor >>>>> will be more better and cleaner way without complicating much. :) Simply >>>>> if >>>>> we are able to mention what are the fields that needs to be indexed in >>>>> client side, and then make a call to LAS REST service before publishing >>>>> data, then we can set the schema accordingly and things will work without >>>>> any big effort . >>>>> >>>>> Thanks, >>>>> Sinthuja. >>>>> >>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Dec 2, 2015 at 10:11 AM, Anjana Fernando <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sachith, >>>>>>>> >>>>>>>> Doesn't the agent have the knowledge of the log types/categories >>>>>>>> and their field information when it is initializing? .. as in, as I >>>>>>>> understood, we give what fields needs to be sent out in the >>>>>>>> configurations, >>>>>>>> isn't that the case? .. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Anjana. >>>>>>>> >>>>>>>> On Wed, Dec 2, 2015 at 10:01 AM, Sachith Withana <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> There might be a slight issue. We wouldn't know the arbitrary >>>>>>>>> fields before the log agent starts publishing, since the agent only >>>>>>>>> publishes and we don't have control over which fields would be sent ( >>>>>>>>> unless we configure all the agents ourselves). So we would have to >>>>>>>>> check >>>>>>>>> for each event, if there are new fields apart from that are there in >>>>>>>>> the >>>>>>>>> schema. This is undesirable. >>>>>>>>> >>>>>>>>> And as Anjana pointed out we don't have a way to specify to index >>>>>>>>> all the arbitrary values unless we set the schema accordingly. >>>>>>>>> >>>>>>>>> Is it possible to specify in the schema to index everything? >>>>>>>>> >>>>>>>>> On Wed, Dec 2, 2015 at 9:38 AM, Anjana Fernando <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Malith, >>>>>>>>>> >>>>>>>>>> The functionality which you're requesting is very specific, and >>>>>>>>>> from DAS side, it doesn't make sense to implement this in a generic >>>>>>>>>> way, >>>>>>>>>> which is not used usually. And it is anyway not the way, the log >>>>>>>>>> analyzer >>>>>>>>>> should use it. The different log sources, will know their fields >>>>>>>>>> before >>>>>>>>>> they send out data, it doesn't have to be checked every time an >>>>>>>>>> event is >>>>>>>>>> published. A log source would instruct the log analyzer backend API, >>>>>>>>>> the >>>>>>>>>> new fields, this specific log source will be sending, and with the >>>>>>>>>> earlier >>>>>>>>>> message, the backend service will set the global table's schema >>>>>>>>>> properly, >>>>>>>>>> and then the remote log agent will be sending out log records to be >>>>>>>>>> processed by the server. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Anjana. >>>>>>>>>> >>>>>>>>>> On Tue, Dec 1, 2015 at 6:44 PM, Malith Dhanushka <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Hi Anjana, >>>>>>>>>>> >>>>>>>>>>> Yes. Requirement is for the internal log related REST API which >>>>>>>>>>> is being written using osgi services. In the perspective of log >>>>>>>>>>> analysis >>>>>>>>>>> data, we have one master table to persist all the log events from >>>>>>>>>>> different >>>>>>>>>>> log sources. The way log data comes in to log REST API is as >>>>>>>>>>> arbitrary >>>>>>>>>>> fields. So different log sources have different set of arbitrary >>>>>>>>>>> fields >>>>>>>>>>> which leads log REST API to change the schema of master table every >>>>>>>>>>> time it >>>>>>>>>>> receives log events from a new/updated log source. That's what i >>>>>>>>>>> meant >>>>>>>>>>> inaccurate which can be solved much cleaner way by having that flag >>>>>>>>>>> to >>>>>>>>>>> index or not to index arbitrary fields for a particular stream. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Malith >>>>>>>>>>> >>>>>>>>>>> On Tue, Dec 1, 2015 at 6:06 PM, Anjana Fernando <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Malith, >>>>>>>>>>>> >>>>>>>>>>>> No, it cannot be done like that. How the indexing and all >>>>>>>>>>>> happens is, it looks up the table schema for a table and do the >>>>>>>>>>>> indexing >>>>>>>>>>>> according to that. So the table schema must be set before hand. It >>>>>>>>>>>> is not a >>>>>>>>>>>> dynamic thing that can be set, when arbitrary fields are sent to >>>>>>>>>>>> the >>>>>>>>>>>> receiver, and it cannot always load the current schema and set it >>>>>>>>>>>> always >>>>>>>>>>>> for each event, even though we can cache that information and do >>>>>>>>>>>> some >>>>>>>>>>>> operations, but that gets complicated. So the idea is, it is the >>>>>>>>>>>> responsibility of the client to set the target table's schema >>>>>>>>>>>> properly >>>>>>>>>>>> before hand, which may or may not include arbitrary fields, and >>>>>>>>>>>> then send >>>>>>>>>>>> the data. >>>>>>>>>>>> >>>>>>>>>>>> Also, if this requirement is for the log analytics solution >>>>>>>>>>>> work, as we've discussed before, there should be a whole new >>>>>>>>>>>> remote API for >>>>>>>>>>>> that, and that API can do these operations inside the server, >>>>>>>>>>>> using the >>>>>>>>>>>> OSGi services, and not the original DAS REST API. So those >>>>>>>>>>>> operations will >>>>>>>>>>>> happen automatically while keeping the remote log related API >>>>>>>>>>>> clean. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Anjana. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Dec 1, 2015 at 5:13 PM, Malith Dhanushka < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Folks, >>>>>>>>>>>>> >>>>>>>>>>>>> Currently indexing arbitrary fields is being achieved by >>>>>>>>>>>>> dynamically updating analytics table schema through analytics >>>>>>>>>>>>> REST API. >>>>>>>>>>>>> This is not an accurate solution for a frequently updating >>>>>>>>>>>>> schema. So the >>>>>>>>>>>>> ideal solution would be to have a flag in data bridge event sink >>>>>>>>>>>>> configuration to enable/disable indexing for all arbitrary fields. >>>>>>>>>>>>> >>>>>>>>>>>>> WDUT? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Malith >>>>>>>>>>>>> -- >>>>>>>>>>>>> Malith Dhanushka >>>>>>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Anjana Fernando* >>>>>>>>>>>> Senior Technical Lead >>>>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Malith Dhanushka >>>>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Anjana Fernando* >>>>>>>>>> Senior Technical Lead >>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>> lean . enterprise . middleware >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sachith Withana >>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>>> E-mail: sachith AT wso2.com >>>>>>>>> M: +94715518127 >>>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Anjana Fernando* >>>>>>>> Senior Technical Lead >>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>> lean . enterprise . middleware >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sachith Withana >>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>> E-mail: sachith AT wso2.com >>>>>>> M: +94715518127 >>>>>>> Linked-In: <http://goog_416592669> >>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Anjana Fernando* >>>>>> Senior Technical Lead >>>>>> WSO2 Inc. | http://wso2.com >>>>>> lean . enterprise . middleware >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Sinthuja Rajendran* >>>>> Associate Technical Lead >>>>> WSO2, Inc.:http://wso2.com >>>>> >>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>> Mobile: +94774273955 >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Malith Dhanushka >>>> Senior Software Engineer - Data Technologies >>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>> *Mobile* : +94 716 506 693 >>>> >>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >> >> >> -- >> Malith Dhanushka >> Senior Software Engineer - Data Technologies >> *WSO2, Inc. : wso2.com <http://wso2.com/>* >> *Mobile* : +94 716 506 693 >> > > > > -- > W.G. Gihan Anuruddha > Senior Software Engineer | WSO2, Inc. > M: +94772272595 > -- *Anuruddha Premalal* Software Eng. | WSO2 Inc. Mobile : +94717213122 Web site : www.anuruddha.org
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
