Hi Sinthuja, IMO user doesn't need to be aware of indexing concept at all. User defining a specific field means, he might at some point need to query based on that parameter, and it's our responsibility to cater that under the hood without user being worried about indexing, meaning we need to index all the fields coming from the stream, if the user doesn't need indexing he can simply get-rid of that key. Find the answer to your use-case inline.
On Wed, Dec 2, 2015 at 5:18 PM, Sinthuja Ragendran <[email protected]> wrote: > > On Wed, Dec 2, 2015 at 4:53 PM, Malith Dhanushka <[email protected]> wrote: > >> >> >> On Wed, Dec 2, 2015 at 4:47 PM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi Malith, >>> >>> On Wed, Dec 2, 2015 at 4:41 PM, Malith Dhanushka <[email protected]> >>> wrote: >>> >>>> Hi Folks, >>>> >>>> We had an offline chat about this. >>>> >>>> Since indexing all the arbitrary fields is not feasible with the >>>> current architecture, requirement of indexing arbitrary fields in log >>>> analyzer will be handled in Log analyzer REST API. Idea is to compare the >>>> incoming event with existing schema which is kept in in-memory and if there >>>> is a change then to update the table schema. >>>> >>> >>> In this case, all the fields are going to be indexed? Is there any way >>> with this solution to say I need specific fields (say x, y, z) to be >>> indexed in the log event and not all the fields? >>> >> >> No. In this way client wont send the table schema before hand. Up on the >> change of an event , REST API will dynamically update the schema. Since >> this is log analyzer specific scenario , all the events needs to be >> indexed. >> > > In log analyzer scenario also, is it always necessary to index all > fields?? Anyhow number of indexing fields will have some influence in the > resource utilization, load, etc in the indexing operation, and hence IMHO > users should be having a way to only index the fields which they are > interested in using the log search operation. For example, I have log > events with (ip address, timestamp, operation, resource, data transferred, > client-browser) fields, but i'm only going to search by using the fields ip > address, timestamp, operation, resource and data transferred, > client-browser is not going to be part of my search field, but i wanted to > see those fields in my final log search result. Is there any way to achieve > this? > > This can easily be achieved by only defining the interested fields(ipaddress) and sending the whole message as a new field. This way use can optimize the resource utilization concerns(with indexing) as well. > Atleast the user should be able to remove the unwanted indexing field from > the table schema with management-console, or Analytics REST API, but I > think with this solution anyhow once the log event is received with > particular arbitrary field it's going to add the filed again for indexing. > Please correct me if I'm wrong. > In most of the log analysis solutions available, user doesn't need to worry about indexing or user doesn't need to be aware of such a concept, IMO it's something internal to the solution that we have to handle. In-terms of index growth handling we can use archiving and index expiration mechanisms. > > Thanks, > Sinthuja. > > >> >> Thanks >> >>> >>> Thanks, >>> Sinthuja. >>> >>>> >>>> Overriding table schema will make event sink configuration inconsistent >>>> with table schema. To avoid that event sink feature needs to be improved in >>>> order to support merging table schemas. For that event persist feature >>>> should have a flag to enable/disable merging table schemas. >>>> >>>> Thanks, >>>> >>>> On Wed, Dec 2, 2015 at 1:30 PM, Sinthuja Ragendran <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> On Wed, Dec 2, 2015 at 11:05 AM, Anjana Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> On Wed, Dec 2, 2015 at 10:17 AM, Sachith Withana <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Now that we are using logstash out of the box, without the >>>>>>> DASConnector, it won't do that. >>>>>>> >>>>>>> The logstash would just start publishing and with the current >>>>>>> design, AFAIK the schema setting would be handled by the LAS server, >>>>>>> >>>>>> >>>>>> Oh yeah, I see .. >>>>>> >>>>>> >>>>>>> >>>>>>> BTW for that requirement, can we provide a way to allow indexing all >>>>>>> the columns? >>>>>>> >>>>>> >>>>>> Well .. we can .. I guess this is the same that Malith request in the >>>>>> first mail. Only thing is, we have to change the internals/architecture >>>>>> of >>>>>> how we do indexing currently, the current logic is, we check the input >>>>>> value against the table schema, and do the required indexing. For >>>>>> example, >>>>>> if facets are defined, data types etc.. so if we are just saying, to >>>>>> index >>>>>> all fields, it will be a new path there, and also we have to introduce a >>>>>> new special flag for a table to say, index all. Also, we should need some >>>>>> mechanism of figuring out the fields of a specific log type in the >>>>>> server, >>>>>> where at least with the table schema, we knew what are all the fields >>>>>> that's there for all the log types. Ideally, we need to store some >>>>>> metadata >>>>>> somewhere saying, for this specific log type, these are the fields and so >>>>>> on. Do we get some kind of a log category/type information with the >>>>>> standard logstash HTTP connector? .. any other schema setting, storing of >>>>>> metadata can be done in the server side, and we can cache it in-memory to >>>>>> do fast lookups and modifications of the schema (together with some >>>>>> cluster >>>>>> messaging to keep it in-sync with other nodes). >>>>>> >>>>>> Or else, maybe we are again back to writing our own logstash adapter >>>>>> which will make the whole thing much simpler? .. >>>>>> >>>>> >>>>> Yeah +1 , actually I was also thinking having our own logstash adaptor >>>>> will be more better and cleaner way without complicating much. :) Simply >>>>> if >>>>> we are able to mention what are the fields that needs to be indexed in >>>>> client side, and then make a call to LAS REST service before publishing >>>>> data, then we can set the schema accordingly and things will work without >>>>> any big effort . >>>>> >>>>> Thanks, >>>>> Sinthuja. >>>>> >>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Dec 2, 2015 at 10:11 AM, Anjana Fernando <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sachith, >>>>>>>> >>>>>>>> Doesn't the agent have the knowledge of the log types/categories >>>>>>>> and their field information when it is initializing? .. as in, as I >>>>>>>> understood, we give what fields needs to be sent out in the >>>>>>>> configurations, >>>>>>>> isn't that the case? .. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Anjana. >>>>>>>> >>>>>>>> On Wed, Dec 2, 2015 at 10:01 AM, Sachith Withana <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> There might be a slight issue. We wouldn't know the arbitrary >>>>>>>>> fields before the log agent starts publishing, since the agent only >>>>>>>>> publishes and we don't have control over which fields would be sent ( >>>>>>>>> unless we configure all the agents ourselves). So we would have to >>>>>>>>> check >>>>>>>>> for each event, if there are new fields apart from that are there in >>>>>>>>> the >>>>>>>>> schema. This is undesirable. >>>>>>>>> >>>>>>>>> And as Anjana pointed out we don't have a way to specify to index >>>>>>>>> all the arbitrary values unless we set the schema accordingly. >>>>>>>>> >>>>>>>>> Is it possible to specify in the schema to index everything? >>>>>>>>> >>>>>>>>> On Wed, Dec 2, 2015 at 9:38 AM, Anjana Fernando <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Malith, >>>>>>>>>> >>>>>>>>>> The functionality which you're requesting is very specific, and >>>>>>>>>> from DAS side, it doesn't make sense to implement this in a generic >>>>>>>>>> way, >>>>>>>>>> which is not used usually. And it is anyway not the way, the log >>>>>>>>>> analyzer >>>>>>>>>> should use it. The different log sources, will know their fields >>>>>>>>>> before >>>>>>>>>> they send out data, it doesn't have to be checked every time an >>>>>>>>>> event is >>>>>>>>>> published. A log source would instruct the log analyzer backend API, >>>>>>>>>> the >>>>>>>>>> new fields, this specific log source will be sending, and with the >>>>>>>>>> earlier >>>>>>>>>> message, the backend service will set the global table's schema >>>>>>>>>> properly, >>>>>>>>>> and then the remote log agent will be sending out log records to be >>>>>>>>>> processed by the server. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Anjana. >>>>>>>>>> >>>>>>>>>> On Tue, Dec 1, 2015 at 6:44 PM, Malith Dhanushka <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Hi Anjana, >>>>>>>>>>> >>>>>>>>>>> Yes. Requirement is for the internal log related REST API which >>>>>>>>>>> is being written using osgi services. In the perspective of log >>>>>>>>>>> analysis >>>>>>>>>>> data, we have one master table to persist all the log events from >>>>>>>>>>> different >>>>>>>>>>> log sources. The way log data comes in to log REST API is as >>>>>>>>>>> arbitrary >>>>>>>>>>> fields. So different log sources have different set of arbitrary >>>>>>>>>>> fields >>>>>>>>>>> which leads log REST API to change the schema of master table every >>>>>>>>>>> time it >>>>>>>>>>> receives log events from a new/updated log source. That's what i >>>>>>>>>>> meant >>>>>>>>>>> inaccurate which can be solved much cleaner way by having that flag >>>>>>>>>>> to >>>>>>>>>>> index or not to index arbitrary fields for a particular stream. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Malith >>>>>>>>>>> >>>>>>>>>>> On Tue, Dec 1, 2015 at 6:06 PM, Anjana Fernando <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Malith, >>>>>>>>>>>> >>>>>>>>>>>> No, it cannot be done like that. How the indexing and all >>>>>>>>>>>> happens is, it looks up the table schema for a table and do the >>>>>>>>>>>> indexing >>>>>>>>>>>> according to that. So the table schema must be set before hand. It >>>>>>>>>>>> is not a >>>>>>>>>>>> dynamic thing that can be set, when arbitrary fields are sent to >>>>>>>>>>>> the >>>>>>>>>>>> receiver, and it cannot always load the current schema and set it >>>>>>>>>>>> always >>>>>>>>>>>> for each event, even though we can cache that information and do >>>>>>>>>>>> some >>>>>>>>>>>> operations, but that gets complicated. So the idea is, it is the >>>>>>>>>>>> responsibility of the client to set the target table's schema >>>>>>>>>>>> properly >>>>>>>>>>>> before hand, which may or may not include arbitrary fields, and >>>>>>>>>>>> then send >>>>>>>>>>>> the data. >>>>>>>>>>>> >>>>>>>>>>>> Also, if this requirement is for the log analytics solution >>>>>>>>>>>> work, as we've discussed before, there should be a whole new >>>>>>>>>>>> remote API for >>>>>>>>>>>> that, and that API can do these operations inside the server, >>>>>>>>>>>> using the >>>>>>>>>>>> OSGi services, and not the original DAS REST API. So those >>>>>>>>>>>> operations will >>>>>>>>>>>> happen automatically while keeping the remote log related API >>>>>>>>>>>> clean. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Anjana. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Dec 1, 2015 at 5:13 PM, Malith Dhanushka < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Folks, >>>>>>>>>>>>> >>>>>>>>>>>>> Currently indexing arbitrary fields is being achieved by >>>>>>>>>>>>> dynamically updating analytics table schema through analytics >>>>>>>>>>>>> REST API. >>>>>>>>>>>>> This is not an accurate solution for a frequently updating >>>>>>>>>>>>> schema. So the >>>>>>>>>>>>> ideal solution would be to have a flag in data bridge event sink >>>>>>>>>>>>> configuration to enable/disable indexing for all arbitrary fields. >>>>>>>>>>>>> >>>>>>>>>>>>> WDUT? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Malith >>>>>>>>>>>>> -- >>>>>>>>>>>>> Malith Dhanushka >>>>>>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Anjana Fernando* >>>>>>>>>>>> Senior Technical Lead >>>>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Malith Dhanushka >>>>>>>>>>> Senior Software Engineer - Data Technologies >>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>>>>>>>>> *Mobile* : +94 716 506 693 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Anjana Fernando* >>>>>>>>>> Senior Technical Lead >>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>> lean . enterprise . middleware >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sachith Withana >>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>>> E-mail: sachith AT wso2.com >>>>>>>>> M: +94715518127 >>>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Anjana Fernando* >>>>>>>> Senior Technical Lead >>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>> lean . enterprise . middleware >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sachith Withana >>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>> E-mail: sachith AT wso2.com >>>>>>> M: +94715518127 >>>>>>> Linked-In: <http://goog_416592669> >>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Anjana Fernando* >>>>>> Senior Technical Lead >>>>>> WSO2 Inc. | http://wso2.com >>>>>> lean . enterprise . middleware >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Sinthuja Rajendran* >>>>> Associate Technical Lead >>>>> WSO2, Inc.:http://wso2.com >>>>> >>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>> Mobile: +94774273955 >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Malith Dhanushka >>>> Senior Software Engineer - Data Technologies >>>> *WSO2, Inc. : wso2.com <http://wso2.com/>* >>>> *Mobile* : +94 716 506 693 >>>> >>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >> >> >> -- >> Malith Dhanushka >> Senior Software Engineer - Data Technologies >> *WSO2, Inc. : wso2.com <http://wso2.com/>* >> *Mobile* : +94 716 506 693 >> > > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- *Anuruddha Premalal* Software Eng. | WSO2 Inc. Mobile : +94717213122 Web site : www.anuruddha.org
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
