Re: [Dev] [DAS] Indexing arbitrary fields

Anuruddha Premalal Wed, 02 Dec 2015 03:41:05 -0800

Hi Gihan,

On Wed, Dec 2, 2015 at 5:00 PM, Gihan Anuruddha <[email protected]> wrote:


> So if I send 3 consecutive events with different arbitrary fields, does
> this schema update 3 times consecutively?
>
yes

> How often server get events that have a different arbitrary map?
>

At most cases, setSchema will happen only at the first log event of a new
log stream. 'cause all the following events will have the same keyset.
However there's a case in logstash where, if a particular value is empty
corresponding key is not sent with the event, because of this we have to
check and update schema  for each and every event.


> Can we expect each event have a different arbitrary map situations?
>
> Haven't came across such a scenario yet.

Regards,
> Gihan
>
> On Wed, Dec 2, 2015 at 4:53 PM, Malith Dhanushka <[email protected]> wrote:
>
>>
>>
>> On Wed, Dec 2, 2015 at 4:47 PM, Sinthuja Ragendran <[email protected]>
>> wrote:
>>
>>> Hi Malith,
>>>
>>> On Wed, Dec 2, 2015 at 4:41 PM, Malith Dhanushka <[email protected]>
>>> wrote:
>>>
>>>> Hi Folks,
>>>>
>>>> We had an offline chat about this.
>>>>
>>>> Since indexing all the arbitrary fields is not feasible with the
>>>> current architecture, requirement of indexing arbitrary fields in log
>>>> analyzer will be handled in Log analyzer REST API. Idea is to compare the
>>>> incoming event with existing schema which is kept in in-memory and if there
>>>> is a change then to update the table schema.
>>>>
>>>
>>> In this case, all the fields are going to be indexed? Is there any way
>>> with this solution to say I need specific fields (say x, y, z) to be
>>> indexed in the log event and not all the fields?
>>>
>>
>> No. In this way client wont send the table schema before hand. Up on the
>> change of an event , REST API will dynamically update the schema.  Since
>> this is log analyzer specific scenario , all the events needs to be
>> indexed.
>>
>> Thanks
>>
>>>
>>> Thanks,
>>> Sinthuja.
>>>
>>>>
>>>> Overriding table schema will make event sink configuration inconsistent
>>>> with table schema. To avoid that event sink feature needs to be improved in
>>>> order to support merging table schemas. For that event persist feature
>>>> should have a flag to enable/disable merging table schemas.
>>>>
>>>> Thanks,
>>>>
>>>> On Wed, Dec 2, 2015 at 1:30 PM, Sinthuja Ragendran <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Wed, Dec 2, 2015 at 11:05 AM, Anjana Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Dec 2, 2015 at 10:17 AM, Sachith Withana <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Now that we are using logstash out of the box, without the
>>>>>>> DASConnector, it won't do that.
>>>>>>>
>>>>>>> The logstash would just start publishing and with the current
>>>>>>> design, AFAIK the schema setting would be handled by the LAS server,
>>>>>>>
>>>>>>
>>>>>> Oh yeah, I see ..
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> BTW for that requirement, can we provide a way to allow indexing all
>>>>>>> the columns?
>>>>>>>
>>>>>>
>>>>>> Well .. we can .. I guess this is the same that Malith request in the
>>>>>> first mail. Only thing is, we have to change the internals/architecture 
>>>>>> of
>>>>>> how we do indexing currently, the current logic is, we check the input
>>>>>> value against the table schema, and do the required indexing. For 
>>>>>> example,
>>>>>> if facets are defined, data types etc.. so if we are just saying, to 
>>>>>> index
>>>>>> all fields, it will be a new path there, and also we have to introduce a
>>>>>> new special flag for a table to say, index all. Also, we should need some
>>>>>> mechanism of figuring out the fields of a specific log type in the 
>>>>>> server,
>>>>>> where at least with the table schema, we knew what are all the fields
>>>>>> that's there for all the log types. Ideally, we need to store some 
>>>>>> metadata
>>>>>> somewhere saying, for this specific log type, these are the fields and so
>>>>>> on. Do we get some kind of a log category/type information with the
>>>>>> standard logstash HTTP connector? .. any other schema setting, storing of
>>>>>> metadata can be done in the server side, and we can cache it in-memory to
>>>>>> do fast lookups and modifications of the schema (together with some 
>>>>>> cluster
>>>>>> messaging to keep it in-sync with other nodes).
>>>>>>
>>>>>> Or else, maybe we are again back to writing our own logstash adapter
>>>>>> which will make the whole thing much simpler? ..
>>>>>>
>>>>>
>>>>> Yeah +1 , actually I was also thinking having our own logstash adaptor
>>>>> will be more better and cleaner way without complicating much. :) Simply 
>>>>> if
>>>>> we are able to mention what are the fields that needs to be indexed in
>>>>> client side, and then make a call to LAS REST service before publishing
>>>>> data, then we can set the schema accordingly and things will work without
>>>>> any big effort .
>>>>>
>>>>> Thanks,
>>>>> Sinthuja.
>>>>>
>>>>>
>>>>>> Cheers,
>>>>>> Anjana.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 2, 2015 at 10:11 AM, Anjana Fernando <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sachith,
>>>>>>>>
>>>>>>>> Doesn't the agent have the knowledge of the log types/categories
>>>>>>>> and their field information when it is initializing? .. as in, as I
>>>>>>>> understood, we give what fields needs to be sent out in the 
>>>>>>>> configurations,
>>>>>>>> isn't that the case? ..
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Anjana.
>>>>>>>>
>>>>>>>> On Wed, Dec 2, 2015 at 10:01 AM, Sachith Withana <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> There might be a slight issue. We wouldn't know the arbitrary
>>>>>>>>> fields before the log agent starts publishing, since the agent only
>>>>>>>>> publishes and we don't have control over which fields would be sent (
>>>>>>>>> unless we configure all the agents ourselves). So we would have to 
>>>>>>>>> check
>>>>>>>>> for each event, if there are new fields apart from that are there in 
>>>>>>>>> the
>>>>>>>>> schema. This is undesirable.
>>>>>>>>>
>>>>>>>>> And as Anjana pointed out we don't have a way to specify to index
>>>>>>>>> all the arbitrary values unless we set the schema accordingly.
>>>>>>>>>
>>>>>>>>> Is it possible to specify in the schema to index everything?
>>>>>>>>>
>>>>>>>>> On Wed, Dec 2, 2015 at 9:38 AM, Anjana Fernando <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Malith,
>>>>>>>>>>
>>>>>>>>>> The functionality which you're requesting is very specific, and
>>>>>>>>>> from DAS side, it doesn't make sense to implement this in a generic 
>>>>>>>>>> way,
>>>>>>>>>> which is not used usually. And it is anyway not the way, the log 
>>>>>>>>>> analyzer
>>>>>>>>>> should use it. The different log sources, will know their fields 
>>>>>>>>>> before
>>>>>>>>>> they send out data, it doesn't have to be checked every time an 
>>>>>>>>>> event is
>>>>>>>>>> published. A log source would instruct the log analyzer backend API, 
>>>>>>>>>> the
>>>>>>>>>> new fields, this specific log source will be sending, and with the 
>>>>>>>>>> earlier
>>>>>>>>>> message, the backend service will set the global table's schema 
>>>>>>>>>> properly,
>>>>>>>>>> and then the remote log agent will be sending out log records to be
>>>>>>>>>> processed by the server.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Anjana.
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 1, 2015 at 6:44 PM, Malith Dhanushka <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Anjana,
>>>>>>>>>>>
>>>>>>>>>>> Yes. Requirement is for the internal log related REST API which
>>>>>>>>>>> is being written using osgi services. In the perspective of log 
>>>>>>>>>>> analysis
>>>>>>>>>>> data, we have one master table to persist all the log events from 
>>>>>>>>>>> different
>>>>>>>>>>> log sources. The way log data comes in to log REST API is as 
>>>>>>>>>>> arbitrary
>>>>>>>>>>> fields. So different log sources have different set of arbitrary 
>>>>>>>>>>> fields
>>>>>>>>>>> which leads log REST API to change the schema of master table every 
>>>>>>>>>>> time it
>>>>>>>>>>> receives log events from a new/updated log source. That's what i 
>>>>>>>>>>> meant
>>>>>>>>>>> inaccurate which can be solved much cleaner way by having that flag 
>>>>>>>>>>> to
>>>>>>>>>>> index or not to index arbitrary fields for a particular stream.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Malith
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Dec 1, 2015 at 6:06 PM, Anjana Fernando <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Malith,
>>>>>>>>>>>>
>>>>>>>>>>>> No, it cannot be done like that. How the indexing and all
>>>>>>>>>>>> happens is, it looks up the table schema for a table and do the 
>>>>>>>>>>>> indexing
>>>>>>>>>>>> according to that. So the table schema must be set before hand. It 
>>>>>>>>>>>> is not a
>>>>>>>>>>>> dynamic thing that can be set, when arbitrary fields are sent to 
>>>>>>>>>>>> the
>>>>>>>>>>>> receiver, and it cannot always load the current schema and set it 
>>>>>>>>>>>> always
>>>>>>>>>>>> for each event, even though we can cache that information and do 
>>>>>>>>>>>> some
>>>>>>>>>>>> operations, but that gets complicated. So the idea is, it is the
>>>>>>>>>>>> responsibility of the client to set the target table's schema 
>>>>>>>>>>>> properly
>>>>>>>>>>>> before hand, which may or may not include arbitrary fields, and 
>>>>>>>>>>>> then send
>>>>>>>>>>>> the data.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, if this requirement is for the log analytics solution
>>>>>>>>>>>> work, as we've discussed before, there should be a whole new 
>>>>>>>>>>>> remote API for
>>>>>>>>>>>> that, and that API can do these operations inside the server, 
>>>>>>>>>>>> using the
>>>>>>>>>>>> OSGi services, and not the original DAS REST API. So those 
>>>>>>>>>>>> operations will
>>>>>>>>>>>> happen automatically while keeping the remote log related API 
>>>>>>>>>>>> clean.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Dec 1, 2015 at 5:13 PM, Malith Dhanushka <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently indexing arbitrary fields is being achieved by
>>>>>>>>>>>>> dynamically updating analytics table schema through analytics 
>>>>>>>>>>>>> REST API.
>>>>>>>>>>>>> This is not an accurate solution for a frequently updating 
>>>>>>>>>>>>> schema. So the
>>>>>>>>>>>>> ideal solution would be to have a flag in data bridge event sink
>>>>>>>>>>>>> configuration to enable/disable indexing for all arbitrary fields.
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDUT?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Malith
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Malith Dhanushka
>>>>>>>>>>>>> Senior Software Engineer - Data Technologies
>>>>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>>>>>>>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Malith Dhanushka
>>>>>>>>>>> Senior Software Engineer - Data Technologies
>>>>>>>>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>>>>>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>> Senior Technical Lead
>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sachith Withana
>>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>>>>> E-mail: sachith AT wso2.com
>>>>>>>>> M: +94715518127
>>>>>>>>> Linked-In: <http://goog_416592669>
>>>>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Anjana Fernando*
>>>>>>>> Senior Technical Lead
>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>> lean . enterprise . middleware
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sachith Withana
>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com
>>>>>>> E-mail: sachith AT wso2.com
>>>>>>> M: +94715518127
>>>>>>> Linked-In: <http://goog_416592669>
>>>>>>> https://lk.linkedin.com/in/sachithwithana
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Anjana Fernando*
>>>>>> Senior Technical Lead
>>>>>> WSO2 Inc. | http://wso2.com
>>>>>> lean . enterprise . middleware
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Sinthuja Rajendran*
>>>>> Associate Technical Lead
>>>>> WSO2, Inc.:http://wso2.com
>>>>>
>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>> Mobile: +94774273955
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Malith Dhanushka
>>>> Senior Software Engineer - Data Technologies
>>>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>>>> *Mobile*          : +94 716 506 693
>>>>
>>>
>>>
>>>
>>> --
>>> *Sinthuja Rajendran*
>>> Associate Technical Lead
>>> WSO2, Inc.:http://wso2.com
>>>
>>> Blog: http://sinthu-rajan.blogspot.com/
>>> Mobile: +94774273955
>>>
>>>
>>>
>>
>>
>> --
>> Malith Dhanushka
>> Senior Software Engineer - Data Technologies
>> *WSO2, Inc. : wso2.com <http://wso2.com/>*
>> *Mobile*          : +94 716 506 693
>>
>
>
>
> --
> W.G. Gihan Anuruddha
> Senior Software Engineer | WSO2, Inc.
> M: +94772272595
>



-- 
*Anuruddha Premalal*
Software Eng. | WSO2 Inc.
Mobile : +94717213122
Web site : www.anuruddha.org

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [DAS] Indexing arbitrary fields

Reply via email to