On Wed, Dec 2, 2015 at 4:47 PM, Sinthuja Ragendran <[email protected]>
wrote:

> Hi Malith,
>
> On Wed, Dec 2, 2015 at 4:41 PM, Malith Dhanushka <[email protected]> wrote:
>
>> Hi Folks,
>>
>> We had an offline chat about this.
>>
>> Since indexing all the arbitrary fields is not feasible with the current
>> architecture, requirement of indexing arbitrary fields in log analyzer will
>> be handled in Log analyzer REST API. Idea is to compare the incoming event
>> with existing schema which is kept in in-memory and if there is a change
>> then to update the table schema.
>>
>
> In this case, all the fields are going to be indexed? Is there any way
> with this solution to say I need specific fields (say x, y, z) to be
> indexed in the log event and not all the fields?
>

No. In this way client wont send the table schema before hand. Up on the
change of an event , REST API will dynamically update the schema.  Since
this is log analyzer specific scenario , all the events needs to be
indexed.

Thanks

>
> Thanks,
> Sinthuja.
>
>>
>> Overriding table schema will make event sink configuration inconsistent
>> with table schema. To avoid that event sink feature needs to be improved in
>> order to support merging table schemas. For that event persist feature
>> should have a flag to enable/disable merging table schemas.
>>
>> Thanks,
>>
>> On Wed, Dec 2, 2015 at 1:30 PM, Sinthuja Ragendran <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Dec 2, 2015 at 11:05 AM, Anjana Fernando <[email protected]>
>>> wrote:
>>>
>>>> On Wed, Dec 2, 2015 at 10:17 AM, Sachith Withana <[email protected]>
>>>> wrote:
>>>>
>>>>> Now that we are using logstash out of the box, without the
>>>>> DASConnector, it won't do that.
>>>>>
>>>>> The logstash would just start publishing and with the current design,
>>>>> AFAIK the schema setting would be handled by the LAS server,
>>>>>
>>>>
>>>> Oh yeah, I see ..
>>>>
>>>>
>>>>>
>>>>> BTW for that requirement, can we provide a way to allow indexing all
>>>>> the columns?
>>>>>
>>>>
>>>> Well .. we can .. I guess this is the same that Malith request in the
>>>> first mail. Only thing is, we have to change the internals/architecture of
>>>> how we do indexing currently, the current logic is, we check the input
>>>> value against the table schema, and do the required indexing. For example,
>>>> if facets are defined, data types etc.. so if we are just saying, to index
>>>> all fields, it will be a new path there, and also we have to introduce a
>>>> new special flag for a table to say, index all. Also, we should need some
>>>> mechanism of figuring out the fields of a specific log type in the server,
>>>> where at least with the table schema, we knew what are all the fields
>>>> that's there for all the log types. Ideally, we need to store some metadata
>>>> somewhere saying, for this specific log type, these are the fields and so
>>>> on. Do we get some kind of a log category/type information with the
>>>> standard logstash HTTP connector? .. any other schema setting, storing of
>>>> metadata can be done in the server side, and we can cache it in-memory to
>>>> do fast lookups and modifications of the schema (together with some cluster
>>>> messaging to keep it in-sync with other nodes).
>>>>
>>>> Or else, maybe we are again back to writing our own logstash adapter
>>>> which will make the whole thing much simpler? ..
>>>>
>>>
>>> Yeah +1 , actually I was also thinking having our own logstash adaptor
>>> will be more better and cleaner way without complicating much. :) Simply if
>>> we are able to mention what are the fields that needs to be indexed in
>>> client side, and then make a call to LAS REST service before publishing
>>> data, then we can set the schema accordingly and things will work without
>>> any big effort .
>>>
>>> Thanks,
>>> Sinthuja.
>>>
>>>
>>>> Cheers,
>>>> Anjana.
>>>>
>>>>
>>>>>
>>>>> On Wed, Dec 2, 2015 at 10:11 AM, Anjana Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Sachith,
>>>>>>
>>>>>> Doesn't the agent have the knowledge of the log types/categories and
>>>>>> their field information when it is initializing? .. as in, as I 
>>>>>> understood,
>>>>>> we give what fields needs to be sent out in the configurations, isn't 
>>>>>> that
>>>>>> the case? ..
>>>>>>
>>>>>> Cheers,
>>>>>> Anjana.
>>>>>>
>>>>>> On Wed, Dec 2, 2015 at 10:01 AM, Sachith Withana <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> There might be a slight issue. We wouldn't know the arbitrary fields
>>>>>>> before the log agent starts publishing, since the agent only publishes 
>>>>>>> and
>>>>>>> we don't have control over which fields would be sent ( unless we 
>>>>>>> configure
>>>>>>> all the agents ourselves). So we would have to check for each event, if
>>>>>>> there are new fields apart from that are there in the schema. This is
>>>>>>> undesirable.
>>>>>>>
>>>>>>> And as Anjana pointed out we don't have a way to specify to index
>>>>>>> all the arbitrary values unless we set the schema accordingly.
>>>>>>>
>>>>>>> Is it possible to specify in the schema to index everything?
>>>>>>>
>>>>>>> On Wed, Dec 2, 2015 at 9:38 AM, Anjana Fernando <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Malith,
>>>>>>>>
>>>>>>>> The functionality which you're requesting is very specific, and
>>>>>>>> from DAS side, it doesn't make sense to implement this in a generic 
>>>>>>>> way,
>>>>>>>> which is not used usually. And it is anyway not the way, the log 
>>>>>>>> analyzer
>>>>>>>> should use it. The different log sources, will know their fields before
>>>>>>>> they send out data, it doesn't have to be checked every time an event 
>>>>>>>> is
>>>>>>>> published. A log source would instruct the log analyzer backend API, 
>>>>>>>> the
>>>>>>>> new fields, this specific log source will be sending, and with the 
>>>>>>>> earlier
>>>>>>>> message, the backend service will set the global table's schema 
>>>>>>>> properly,
>>>>>>>> and then the remote log agent will be sending out log records to be
>>>>>>>> processed by the server.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Anjana.
>>>>>>>>
>>>>>>>> On Tue, Dec 1, 2015 at 6:44 PM, Malith Dhanushka <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Anjana,
>>>>>>>>>
>>>>>>>>> Yes. Requirement is for the internal log related REST API which is
>>>>>>>>> being written using osgi services. In the perspective of log analysis 
>>>>>>>>> data,
>>>>>>>>> we have one master table to persist all the log events from different 
>>>>>>>>> log
>>>>>>>>> sources. The way log data comes in to log REST API is as arbitrary 
>>>>>>>>> fields.
>>>>>>>>> So different log sources have different set of arbitrary fields which 
>>>>>>>>> leads
>>>>>>>>> log REST API to change the schema of master table every time it 
>>>>>>>>> receives
>>>>>>>>> log events from a new/updated log source. That's what i meant 
>>>>>>>>> inaccurate
>>>>>>>>> which can be solved much cleaner way by having that flag to index or 
>>>>>>>>> not to
>>>>>>>>> index arbitrary fields for a particular stream.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Malith
>>>>>>>>>
>>>>>>>>> On Tue, Dec 1, 2015 at 6:06 PM, Anjana Fernando <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Malith,
>>>>>>>>>>
>>>>>>>>>> No, it cannot be done like that. How the indexing and all happens
>>>>>>>>>> is, it looks up the table schema for a table and do the indexing 
>>>>>>>>>> according
>>>>>>>>>> to that. So the table schema must be set before hand. It is not a 
>>>>>>>>>> dynamic
>>>>>>>>>> thing that can be set, when arbitrary fields are sent to the 
>>>>>>>>>> receiver, and
>>>>>>>>>> it cannot always load the current schema and set it always for each 
>>>>>>>>>> event,
>>>>>>>>>> even though we can cache that information and do some operations, 
>>>>>>>>>> but that
>>>>>>>>>> gets complicated. So the idea is, it is the responsibility of the 
>>>>>>>>>> client to
>>>>>>>>>> set the target table's schema properly before hand, which may or may 
>>>>>>>>>> not
>>>>>>>>>> include arbitrary fields, and then send the data.
>>>>>>>>>>
>>>>>>>>>> Also, if this requirement is for the log analytics solution work,
>>>>>>>>>> as we've discussed before, there should be a whole new remote API 
>>>>>>>>>> for that,
>>>>>>>>>> and that API can do these operations inside the server, using the 
>>>>>>>>>> OSGi
>>>>>>>>>> services, and not the original DAS REST API. So those operations will
>>>>>>>>>> happen automatically while keeping the remote log related API clean.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Anjana.
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 1, 2015 at 5:13 PM, Malith Dhanushka <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>
>>>>>>>>>>> Currently indexing arbitrary fields is being achieved by
>>>>>>>>>>> dynamically updating analytics table schema through analytics REST 
>>>>>>>>>>> API.
>>>>>>>>>>> This is not an accurate solution for a frequently updating schema. 
>>>>>>>>>>> So the
>>>>>>>>>>> ideal solution would be to have a flag in data bridge event sink
>>>>>>>>>>> configuration to enable/disable indexing for all arbitrary fields.
>>>>>>>>>>>
>>>>>>>>>>> WDUT?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Malith
>>>>>>>>>>> --
>>>>>>>>>>> Malith Dhanushka
>>>>>>>>>>> Senior Software Engineer - Data Technologies
>>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>>>>>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>> Senior Technical Lead
>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Malith Dhanushka
>>>>>>>>> Senior Software Engineer - Data Technologies
>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>>>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Anjana Fernando*
>>>>>>>> Senior Technical Lead
>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>> lean . enterprise . middleware
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sachith Withana
>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>>> E-mail: sachith AT wso2.com
>>>>>>> M: +94715518127
>>>>>>> Linked-In: <http://goog_416592669>
>>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Anjana Fernando*
>>>>>> Senior Technical Lead
>>>>>> WSO2 Inc. | http://wso2.com
>>>>>> lean . enterprise . middleware
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sachith Withana
>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>> E-mail: sachith AT wso2.com
>>>>> M: +94715518127
>>>>> Linked-In: <http://goog_416592669>
>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Anjana Fernando*
>>>> Senior Technical Lead
>>>> WSO2 Inc. | http://wso2.com
>>>> lean . enterprise . middleware
>>>>
>>>
>>>
>>>
>>> --
>>> *Sinthuja Rajendran*
>>> Associate Technical Lead
>>> WSO2, Inc.:http://wso2.com
>>>
>>> Blog: http://sinthu-rajan.blogspot.com/
>>> Mobile: +94774273955
>>>
>>>
>>>
>>
>>
>> --
>> Malith Dhanushka
>> Senior Software Engineer - Data Technologies
>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>> *Mobile*          : +94 716 506 693
>>
>
>
>
> --
> *Sinthuja Rajendran*
> Associate Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
Malith Dhanushka
Senior Software Engineer - Data Technologies
*WSO2, Inc. : wso2.com <http://wso2.com/>*
*Mobile*          : +94 716 506 693
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to