Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Supun Sethunga Tue, 09 Feb 2016 22:28:37 -0800

Hi Sinthuja,


> IMHO we could solve this issue as having conversions. Basically we could
> use $payloads:payload1 to reference the elements as a convention. If the
> element starts with '$' then it's the reference, not the actual payload. In
> that case if there is a new element introduced, let's say foo and you need
> to access the property property1, then it will have the reference as
> $foo:property1.


Yes, that's possible as well. But again, if the value for the property, say
'foo', has an actual value starting with some special character.. (in this
case '$'), we may run in to ambiguity. (true, the chances are pretty less,
but still possible).


 Also this json event format is being sent as event payload in wso2 event,
> and wso2 event is being published by the data publisher right? Correct me
> if i'm wrong.


Yes.

Thanks,
Supun


On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran <[email protected]>
wrote:

> Hi Supun,
>
> Also this json event format is being sent as event payload in wso2 event,
> and wso2 event is being published by the data publisher right? Correct me
> if i'm wrong.
>
> Thanks,
> Sinthuja.
>
> On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <[email protected]>
> wrote:
>
>> Hi Supun,
>>
>>
>> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi Sinthuja,
>>>
>>> Agree on the possibility of simplifying the json. We also discussed on
>>> the same matter yesterday, but the complication came up was, by an event in
>>> the "events" list, payload could be either referenced, or defined
>>> in-line.(made as it is, so that it can be generalized for other fields as
>>> well if needed, other than payloads.).
>>>
>> In such a case, if we had defined as 'payload': '*payload1**', *we would
>>> not know if its the actual payload, or a reference to the payload in the
>>> "payloads" section.
>>>
>>> With the suggested format, DAS will only go and map the payload if its
>>> null.
>>>
>>>
>> IMHO we could solve this issue as having conversions. Basically we could
>> use $payloads:payload1 to reference the elements as a convention. If the
>> element starts with '$' then it's the reference, not the actual payload. In
>> that case if there is a new element introduced, let's say foo and you need
>> to access the property property1, then it will have the reference as
>> $foo:property1.
>>
>> Thanks,
>> Sinthuja.
>>
>>
>>
>>> Regards,
>>> Supun
>>>
>>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]>
>>> wrote:
>>>
>>>> Hi Supun,
>>>>
>>>> I think we could simplify the json message bit more. Instead of 'null'
>>>> for the payload attributes in the events section, you could use the actual
>>>> payload name directly if there is a payload for that event. And in that
>>>> case, we could eliminate the 'events' section from the 'payloads' section.
>>>> For the given example, it could be altered as below.
>>>>
>>>> {
>>>> 'events': [{
>>>> 'messageId': 'aaa',
>>>> 'componentId': '111',
>>>> 'payload': '*payload1*',
>>>> 'componentName': 'Proxy:TestProxy',
>>>> 'output-payload':null
>>>> }, {
>>>> 'messageId': 'bbb',
>>>> 'componentId': '222',
>>>> 'componentName': 'Proxy:TestProxy',
>>>> 'payload': '*payload1*',
>>>> 'output-payload':null
>>>> }, {
>>>> 'messageId': 'ccc',
>>>> 'componentId': '789',
>>>> 'payload': '*payload2*',
>>>> 'componentName': 'Proxy:TestProxy',
>>>> 'output-payload':'*payload2*'
>>>> }],
>>>>
>>>> 'payloads': {
>>>> '*payload1*': 'xml-payload-1',
>>>> '*payload2*': 'xml-payload-2',
>>>> }
>>>> }
>>>>
>>>> Thanks,
>>>> Sinthuja.
>>>>
>>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Budhdhima/Viraj,
>>>>>
>>>>> As per the discussion we had yesterday, follow is the format of the
>>>>> json contains aggregated event details, to be sent to DAS. (you may change
>>>>> the attribute names of events).
>>>>>
>>>>> To explain it further, "events" contains the details about each event
>>>>> sent by each mediator. Payload may or may not be populated. "Payloads"
>>>>> section contains unique payloads and the mapping to the events their
>>>>> fields. (eg:  'xml-payload-2' maps to the 'payload' and 'output-payload'
>>>>> fields of the 3rd event).
>>>>>
>>>>> {
>>>>> 'events': [{
>>>>> 'messageId': 'aaa',
>>>>> 'componentId': '111',
>>>>> 'payload': null,
>>>>>
>>>> 'componentName': 'Proxy:TestProxy',
>>>>> 'output-payload':null
>>>>> }, {
>>>>> 'messageId': 'bbb',
>>>>> 'componentId': '222',
>>>>> 'componentName': 'Proxy:TestProxy',
>>>>> 'payload': null,
>>>>> 'output-payload':null
>>>>> }, {
>>>>> 'messageId': 'ccc',
>>>>> 'componentId': '789',
>>>>> 'payload': null,
>>>>> 'componentName': 'Proxy:TestProxy',
>>>>> 'output-payload':null
>>>>> }],
>>>>>
>>>>> 'payloads': [{
>>>>> 'payload': 'xml-payload-1',
>>>>> 'events': [{
>>>>> 'eventIndex': 0,
>>>>> 'attributes':['payload']
>>>>> }, {
>>>>> 'eventIndex': 1,
>>>>> 'attributes':['payload']
>>>>> }]
>>>>> }, {
>>>>> 'payload': 'xml-payload-2',
>>>>> 'events': [{
>>>>> 'eventIndex': 2,
>>>>> 'attributes':['payload','output-payload']
>>>>> }]
>>>>> }]
>>>>> }
>>>>>
>>>>> Please let us know any further clarifications is needed, or if there's
>>>>> anything to be modified/improved.
>>>>>
>>>>> Thanks,
>>>>> Supun
>>>>>
>>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote:
>>>>>
>>>>>> Hi Kasun,
>>>>>>
>>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I think for trancing use case we need to publish events one by one
>>>>>>> from each mediator (we can't aggregate all such events as it also 
>>>>>>> contains
>>>>>>> the message payload)
>>>>>>>
>>>>>> I think we can still do that with some extra effort.
>>>>>> Most of the mediators in a sequence flow does not alter the message
>>>>>> payload. We can store the payload only for the mediators which alter the
>>>>>> message payload. And for others, we can put a reference to the previous
>>>>>> entry. By doing that we can save the memory to a great extent.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Supun Sethunga <[email protected]>
>>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM
>>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism
>>>>>>> To: Anjana Fernando <[email protected]>
>>>>>>> Cc: "[email protected]" <[email protected]>,
>>>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana <
>>>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru Udana <
>>>>>>> [email protected]>
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Ran some simple performance tests against the new relational
>>>>>>> provider, in comparison with the existing one. Follow are the results:
>>>>>>>
>>>>>>> *Records in Backend DB Table*: *1,054,057*
>>>>>>>
>>>>>>> *Conversion:*
>>>>>>> Spark Table
>>>>>>> id a b c
>>>>>>> Backend DB Table 1 xxx yyy zzz
>>>>>>> id data 1 ppp qqq rrr
>>>>>>> 1
>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>  --
>>>>>>> To --> 1 aaa bbb ccc
>>>>>>> 2
>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>> 2 xxx yyy zzz
>>>>>>> 2 aaa bbb ccc
>>>>>>> 2 ppp qqq rrr
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Avg Time for Query Execution:*
>>>>>>>
>>>>>>> Querry
>>>>>>> Execution time (~ sec)
>>>>>>> Existing Analytics Relation Provider New (ESB) Analytics Relation
>>>>>>> Provider* * New relational provider split a single row to multiple
>>>>>>> rows. Hence the number of rows in the table equivalent to 3 times (as 
>>>>>>> each
>>>>>>> row is split to 3 rows) as the original table.
>>>>>>> SELECT COUNT(*) FROM <Table>; 13 16
>>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16
>>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id
>>>>>>> ASC; 18 26
>>>>>>>
>>>>>>> Regards,
>>>>>>> Supun
>>>>>>>
>>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have started working on implementing a new "relation" / "relation
>>>>>>>> provider", to serve the above requirement. This basically is a modified
>>>>>>>> version of the existing "Carbon Analytics" relation provider.
>>>>>>>>
>>>>>>>> Here I have assumed that the encapsulated data for a single execution
>>>>>>>> flow are stored in a single row, and the data about the mediators
>>>>>>>> invoked during the flow are stored in a known column of each row (say
>>>>>>>> "data"), as an array (say a json array). When each row is read in to 
>>>>>>>> spark,
>>>>>>>> this relational provider create separate rows for each of the element 
>>>>>>>> in
>>>>>>>> the array stored in "data" column. I have tested this with some mocked
>>>>>>>> data, and works as expected.
>>>>>>>>
>>>>>>>> Need to test with the real data/data-formats, and modify the
>>>>>>>> mapping accordingly. Will update the thread with the details.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that,
>>>>>>>>> for their tracing mechanism, they were instructed to publish one 
>>>>>>>>> event for
>>>>>>>>> each of the mediator invocations, where, earlier they had an 
>>>>>>>>> approach, they
>>>>>>>>> publish one event, which encapsulated data of a whole execution flow. 
>>>>>>>>> I
>>>>>>>>> would actually like to support the latter approach, mainly due to
>>>>>>>>> performance / resource requirements. And also considering the fact, 
>>>>>>>>> this is
>>>>>>>>> a feature that could be enabled in production. So simply, if we do one
>>>>>>>>> event per mediator, this does not scale that well. For example, if 
>>>>>>>>> the ESB
>>>>>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k 
>>>>>>>>> TPS for
>>>>>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a 
>>>>>>>>> DAS
>>>>>>>>> cluster with a single backend database, this maybe too many rows per 
>>>>>>>>> second
>>>>>>>>> written to the database. Where the main problem here is, one event 
>>>>>>>>> is, a
>>>>>>>>> single row/record in the backend database in DAS, so it may come to a
>>>>>>>>> state, where the frequency of row creations by events coming from ESBs
>>>>>>>>> cannot be sustained.
>>>>>>>>>
>>>>>>>>> If we create a single event from the 20 mediators, then it is just
>>>>>>>>> 1k TPS for DAS event receivers and the database too, event though the
>>>>>>>>> message size is bigger. It is not necessarily same performance, if you
>>>>>>>>> publish lots of small events to publishing bigger events. Throughput 
>>>>>>>>> wise,
>>>>>>>>> comparatively bigger events will win (even though if we consider that,
>>>>>>>>> small operations will be batched in transport level etc.. still one 
>>>>>>>>> event =
>>>>>>>>> one database row). So I would suggest, we try out a single sequence 
>>>>>>>>> flow =
>>>>>>>>> single event, approach, and from the Spark processing side, we 
>>>>>>>>> consider one
>>>>>>>>> of these big rows as multiple rows in Spark. I was first thinking, if 
>>>>>>>>> UDFs
>>>>>>>>> can help in splitting a single column to multiple rows, and that is 
>>>>>>>>> not
>>>>>>>>> possible, and also, a bit troublesome, considering we have to delete 
>>>>>>>>> the
>>>>>>>>> original data table after we concerted it using a script, and not
>>>>>>>>> forgetting, we actually have to schedule and run a separate script to 
>>>>>>>>> do
>>>>>>>>> this post-processing. So a much cleaner way to do this would be, to 
>>>>>>>>> create
>>>>>>>>> a new "relation provider" in Spark (which is like a data adapter for 
>>>>>>>>> their
>>>>>>>>> DataFrames), and in our relation provider, when we are reading rows, 
>>>>>>>>> we
>>>>>>>>> convert a single row's column to multiple rows and return that for
>>>>>>>>> processing. So Spark will not know, physically it was a single row 
>>>>>>>>> from the
>>>>>>>>> data layer, and it can summarize the data and all as usual and write 
>>>>>>>>> to the
>>>>>>>>> target summary tables. [1] is our existing implementation of Spark 
>>>>>>>>> relation
>>>>>>>>> provider, which directly maps to our DAS analytics tables, we can 
>>>>>>>>> create
>>>>>>>>> the new one extending / based on it. So I suggest we try out this 
>>>>>>>>> approach
>>>>>>>>> and see, if everyone is okay with it.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Anjana.
>>>>>>>>> --
>>>>>>>>> *Anjana Fernando*
>>>>>>>>> Senior Technical Lead
>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "WSO2 Engineering Group" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit
>>>>>>>>> https://groups.google.com/a/wso2.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Supun Sethunga*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> http://wso2.com/
>>>>>>>> lean | enterprise | middleware
>>>>>>>> Mobile : +94 716546324
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Kasun Indrasiri
>>>>>>> Software Architect
>>>>>>> WSO2, Inc.; http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> cell: +94 77 556 5206
>>>>>>> Blog : http://kasunpanorama.blogspot.com/
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Isuru Udana*
>>>>>> Associate Technical Lead
>>>>>> WSO2 Inc.; http://wso2.com
>>>>>> email: [email protected] cell: +94 77 3791887
>>>>>> blog: http://mytecheye.blogspot.com/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Sinthuja Rajendran*
>>>> Associate Technical Lead
>>>> WSO2, Inc.:http://wso2.com
>>>>
>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>> Mobile: +94774273955
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Associate Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>
>
> --
> *Sinthuja Rajendran*
> Associate Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Reply via email to