On Wed, Dec 2, 2015 at 4:47 PM, Sinthuja Ragendran <[email protected]> wrote:
> Hi Malith, > > On Wed, Dec 2, 2015 at 4:41 PM, Malith Dhanushka <[email protected]> wrote: > >> Hi Folks, >> >> We had an offline chat about this. >> >> Since indexing all the arbitrary fields is not feasible with the current >> architecture, requirement of indexing arbitrary fields in log analyzer will >> be handled in Log analyzer REST API. Idea is to compare the incoming event >> with existing schema which is kept in in-memory and if there is a change >> then to update the table schema. >> > > In this case, all the fields are going to be indexed? Is there any way > with this solution to say I need specific fields (say x, y, z) to be > indexed in the log event and not all the fields? > No. In this way client wont send the table schema before hand. Up on the change of an event , REST API will dynamically update the schema. Since this is log analyzer specific scenario , all the events needs to be indexed. Thanks > > Thanks, > Sinthuja. > >> >> Overriding table schema will make event sink configuration inconsistent >> with table schema. To avoid that event sink feature needs to be improved in >> order to support merging table schemas. For that event persist feature >> should have a flag to enable/disable merging table schemas. >> >> Thanks, >> >> On Wed, Dec 2, 2015 at 1:30 PM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi, >>> >>> On Wed, Dec 2, 2015 at 11:05 AM, Anjana Fernando <[email protected]> >>> wrote: >>> >>>> On Wed, Dec 2, 2015 at 10:17 AM, Sachith Withana <[email protected]> >>>> wrote: >>>> >>>>> Now that we are using logstash out of the box, without the >>>>> DASConnector, it won't do that. >>>>> >>>>> The logstash would just start publishing and with the current design, >>>>> AFAIK the schema setting would be handled by the LAS server, >>>>> >>>> >>>> Oh yeah, I see .. >>>> >>>> >>>>> >>>>> BTW for that requirement, can we provide a way to allow indexing all >>>>> the columns? >>>>> >>>> >>>> Well .. we can .. I guess this is the same that Malith request in the >>>> first mail. Only thing is, we have to change the internals/architecture of >>>> how we do indexing currently, the current logic is, we check the input >>>> value against the table schema, and do the required indexing. For example, >>>> if facets are defined, data types etc.. so if we are just saying, to index >>>> all fields, it will be a new path there, and also we have to introduce a >>>> new special flag for a table to say, index all. Also, we should need some >>>> mechanism of figuring out the fields of a specific log type in the server, >>>> where at least with the table schema, we knew what are all the fields >>>> that's there for all the log types. Ideally, we need to store some metadata >>>> somewhere saying, for this specific log type, these are the fields and so >>>> on. Do we get some kind of a log category/type information with the >>>> standard logstash HTTP connector? .. any other schema setting, storing of >>>> metadata can be done in the server side, and we can cache it in-memory to >>>> do fast lookups and modifications of the schema (together with some cluster >>>> messaging to keep it in-sync with other nodes). >>>> >>>> Or else, maybe we are again back to writing our own logstash adapter >>>> which will make the whole thing much simpler? .. >>>> >>> >>> Yeah +1 , actually I was also thinking having our own logstash adaptor >>> will be more better and cleaner way without complicating much. :) Simply if >>> we are able to mention what are the fields that needs to be indexed in >>> client side, and then make a call to LAS REST service before publishing >>> data, then we can set the schema accordingly and things will work without >>> any big effort . >>> >>> Thanks, >>> Sinthuja. >>> >>> >>>> Cheers, >>>> Anjana. >>>> >>>> >>>>> >>>>> On Wed, Dec 2, 2015 at 10:11 AM, Anjana Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Sachith, >>>>>> >>>>>> Doesn't the agent have the knowledge of the log types/categories and >>>>>> their field information when it is initializing? .. as in, as I >>>>>> understood, >>>>>> we give what fields needs to be sent out in the configurations, isn't >>>>>> that >>>>>> the case? .. >>>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> >>>>>> On Wed, Dec 2, 2015 at 10:01 AM, Sachith Withana <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> There might be a slight issue. We wouldn't know the arbitrary fields >>>>>>> before the log agent starts publishing, since the agent only publishes >>>>>>> and >>>>>>> we don't have control over which fields would be sent ( unless we >>>>>>> configure >>>>>>> all the agents ourselves). So we would have to check for each event, if >>>>>>> there are new fields apart from that are there in the schema. This is >>>>>>> undesirable. >>>>>>> >>>>>>> And as Anjana pointed out we don't have a way to specify to index >>>>>>> all the arbitrary values unless we set the schema accordingly. >>>>>>> >>>>>>> Is it possible to specify in the schema to index everything? >>>>>>> >>>>>>> On Wed, Dec 2, 2015 at 9:38 AM, Anjana Fernando <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Malith, >>>>>>>> >>>>>>>> The functionality which you're requesting is very specific, and >>>>>>>> from DAS side, it doesn't make sense to implement this in a generic >>>>>>>> way, >>>>>>>> which is not used usually. And it is anyway not the way, the log >>>>>>>> analyzer >>>>>>>> should use it. The different log sources, will know their fields before >>>>>>>> they send out data, it doesn't have to be checked every time an event >>>>>>>> is >>>>>>>> published. A log source would instruct the log analyzer backend API, >>>>>>>> the >>>>>>>> new fields, this specific log source will be sending, and with the >>>>>>>> earlier >>>>>>>> message, the backend service will set the global table's schema >>>>>>>> properly, >>>>>>>> and then the remote log agent will be sending out log records to be >>>>>>>> processed by the server. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Anjana. >>>>>>>> >>>>>>>> On Tue, Dec 1, 2015 at 6:44 PM, Malith Dhanushka <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Anjana, >>>>>>>>> >>>>>>>>> Yes. Requirement is for the internal log related REST API which is >>>>>>>>> being written using osgi services. In the perspective of log analysis >>>>>>>>> data, >>>>>>>>> we have one master table to persist all the log events from different >>>>>>>>> log >>>>>>>>> sources. The way log data comes in to log REST API is as arbitrary >>>>>>>>> fields. >>>>>>>>> So different log sources have different set of arbitrary fields which >>>>>>>>> leads >>>>>>>>> log REST API to change the schema of master table every time it >>>>>>>>> receives >>>>>>>>> log events from a new/updated log source. That's what i meant >>>>>>>>> inaccurate >>>>>>>>> which can be solved much cleaner way by having that flag to index or >>>>>>>>> not to >>>>>>>>> index arbitrary fields for a particular stream. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Malith >>>>>>>>> >>>>>>>>> On Tue, Dec 1, 2015 at 6:06 PM, Anjana Fernando <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Malith, >>>>>>>>>> >>>>>>>>>> No, it cannot be done like that. How the indexing and all happens >>>>>>>>>> is, it looks up the table schema for a table and do the indexing >>>>>>>>>> according >>>>>>>>>> to that. So the table schema must be set before hand. It is not a >>>>>>>>>> dynamic >>>>>>>>>> thing that can be set, when arbitrary fields are sent to the >>>>>>>>>> receiver, and >>>>>>>>>> it cannot always load the current schema and set it always for each >>>>>>>>>> event, >>>>>>>>>> even though we can cache that information and do some operations, >>>>>>>>>> but that >>>>>>>>>> gets complicated. So the idea is, it is the responsibility of the >>>>>>>>>> client to >>>>>>>>>> set the target table's schema properly before hand, which may or may >>>>>>>>>> not >>>>>>>>>> include arbitrary fields, and then send the data. >>>>>>>>>> >>>>>>>>>> Also, if this requirement is for the log analytics solution work, >>>>>>>>>> as we've discussed before, there should be a whole new remote API >>>>>>>>>> for that, >>>>>>>>>> and that API can do these operations inside the server, using the >>>>>>>>>> OSGi >>>>>>>>>> services, and not the original DAS REST API. So those operations will >>>>>>>>>> happen automatically while keeping the remote log related API clean. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Anjana. >>>>>>>>>> >>>>>>>>>> On Tue, Dec 1, 2015 at 5:13 PM, Malith Dhanushka <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Hi Folks, >>>>>>>>>>> >>>>>>>>>>> Currently indexing arbitrary fields is being achieved by >>>>>>>>>>> dynamically updating analytics table schema through analytics REST >>>>>>>>>>> API. >>>>>>>>>>> This is not an accurate solution for a frequently updating schema. >>>>>>>>>>> So the >>>>>>>>>>> ideal solution would be to have a flag in data bridge event sink >>>>>>>>>>> configuration to enable/disable indexing for all arbitrary fields. >>>>>>>>>>> >>>>>>>>>>> WDUT? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Malith >>>>>>>>>>> -- >>>>>>>>>>> Malith Dhanushka >>>>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Anjana Fernando* >>>>>>>>>> Senior Technical Lead >>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>> lean . enterprise . middleware >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Malith Dhanushka >>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Anjana Fernando* >>>>>>>> Senior Technical Lead >>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>> lean . enterprise . middleware >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sachith Withana >>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>> E-mail: sachith AT wso2.com >>>>>>> M: +94715518127 >>>>>>> Linked-In: <http://goog_416592669> >>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Anjana Fernando* >>>>>> Senior Technical Lead >>>>>> WSO2 Inc. | http://wso2.com >>>>>> lean . enterprise . middleware >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sachith Withana >>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>> E-mail: sachith AT wso2.com >>>>> M: +94715518127 >>>>> Linked-In: <http://goog_416592669> >>>>> https://lk.linkedin.com/in/sachithwithana >>>>> >>>> >>>> >>>> >>>> -- >>>> *Anjana Fernando* >>>> Senior Technical Lead >>>> WSO2 Inc. | http://wso2.com >>>> lean . enterprise . middleware >>>> >>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >> >> >> -- >> Malith Dhanushka >> Senior Software Engineer - Data Technologies >> *WSO2, Inc. : wso2.com <http://wso2.com/>* >> *Mobile* : +94 716 506 693 >> > > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- Malith Dhanushka Senior Software Engineer - Data Technologies *WSO2, Inc. : wso2.com <http://wso2.com/>* *Mobile* : +94 716 506 693
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
