Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Supun Sethunga Tue, 09 Feb 2016 20:49:34 -0800

Hi Budhdhima/Viraj,

As per the discussion we had yesterday, follow is the format of the json
contains aggregated event details, to be sent to DAS. (you may change the
attribute names of events).


To explain it further, "events" contains the details about each event sent
by each mediator. Payload may or may not be populated. "Payloads" section
contains unique payloads and the mapping to the events their fields.
(eg:  'xml-payload-2' maps to the 'payload' and 'output-payload' fields of
the 3rd event).

{
'events': [{
'messageId': 'aaa',
'componentId': '111',
'payload': null,
'componentName': 'Proxy:TestProxy',
'output-payload':null
}, {
'messageId': 'bbb',
'componentId': '222',
'componentName': 'Proxy:TestProxy',
'payload': null,
'output-payload':null
}, {
'messageId': 'ccc',
'componentId': '789',
'payload': null,
'componentName': 'Proxy:TestProxy',
'output-payload':null
}],

'payloads': [{
'payload': 'xml-payload-1',
'events': [{
'eventIndex': 0,
'attributes':['payload']
}, {
'eventIndex': 1,
'attributes':['payload']
}]
}, {
'payload': 'xml-payload-2',
'events': [{
'eventIndex': 2,
'attributes':['payload','output-payload']
}]
}]
}

Please let us know any further clarifications is needed, or if there's
anything to be modified/improved.

Thanks,
Supun

On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote:

> Hi Kasun,
>
> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]> wrote:
>
>> I think for trancing use case we need to publish events one by one from
>> each mediator (we can't aggregate all such events as it also contains the
>> message payload)
>>
> I think we can still do that with some extra effort.
> Most of the mediators in a sequence flow does not alter the message
> payload. We can store the payload only for the mediators which alter the
> message payload. And for others, we can put a reference to the previous
> entry. By doing that we can save the memory to a great extent.
>
> Thanks.
>
>
>>
>> ---------- Forwarded message ----------
>> From: Supun Sethunga <[email protected]>
>> Date: Mon, Feb 8, 2016 at 2:54 PM
>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism
>> To: Anjana Fernando <[email protected]>
>> Cc: "[email protected]" <[email protected]>, Srinath
>> Perera <[email protected]>, Sanjiva Weerawarana <[email protected]>, Kasun
>> Indrasiri <[email protected]>, Isuru Udana <[email protected]>
>>
>>
>> Hi all,
>>
>> Ran some simple performance tests against the new relational provider, in
>> comparison with the existing one. Follow are the results:
>>
>> *Records in Backend DB Table*: *1,054,057*
>>
>> *Conversion:*
>> Spark Table
>> id a b c
>> Backend DB Table 1 xxx yyy zzz
>> id data 1 ppp qqq rrr
>> 1
>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>  --
>> To --> 1 aaa bbb ccc
>> 2
>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>> 2 xxx yyy zzz
>> 2 aaa bbb ccc
>> 2 ppp qqq rrr
>>
>>
>>
>> *Avg Time for Query Execution:*
>>
>> Querry
>> Execution time (~ sec)
>> Existing Analytics Relation Provider New (ESB) Analytics Relation
>> Provider* * New relational provider split a single row to multiple rows.
>> Hence the number of rows in the table equivalent to 3 times (as each row is
>> split to 3 rows) as the original table.
>> SELECT COUNT(*) FROM <Table>; 13 16
>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>> SELECT * FROM <Table> WHERE id=98435; 13 16
>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id ASC;
>> 18 26
>>
>> Regards,
>> Supun
>>
>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I have started working on implementing a new "relation" / "relation
>>> provider", to serve the above requirement. This basically is a modified
>>> version of the existing "Carbon Analytics" relation provider.
>>>
>>> Here I have assumed that the encapsulated data for a single execution
>>> flow are stored in a single row, and the data about the mediators
>>> invoked during the flow are stored in a known column of each row (say
>>> "data"), as an array (say a json array). When each row is read in to spark,
>>> this relational provider create separate rows for each of the element in
>>> the array stored in "data" column. I have tested this with some mocked
>>> data, and works as expected.
>>>
>>> Need to test with the real data/data-formats, and modify the mapping
>>> accordingly. Will update the thread with the details.
>>>
>>> Regards,
>>> Supun
>>>
>>>
>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> In a meeting I'd with Kasun and the ESB team, I got to know that, for
>>>> their tracing mechanism, they were instructed to publish one event for each
>>>> of the mediator invocations, where, earlier they had an approach, they
>>>> publish one event, which encapsulated data of a whole execution flow. I
>>>> would actually like to support the latter approach, mainly due to
>>>> performance / resource requirements. And also considering the fact, this is
>>>> a feature that could be enabled in production. So simply, if we do one
>>>> event per mediator, this does not scale that well. For example, if the ESB
>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS for
>>>> analytics traffic. Combine that with a possible ESB cluster hitting a DAS
>>>> cluster with a single backend database, this maybe too many rows per second
>>>> written to the database. Where the main problem here is, one event is, a
>>>> single row/record in the backend database in DAS, so it may come to a
>>>> state, where the frequency of row creations by events coming from ESBs
>>>> cannot be sustained.
>>>>
>>>> If we create a single event from the 20 mediators, then it is just 1k
>>>> TPS for DAS event receivers and the database too, event though the message
>>>> size is bigger. It is not necessarily same performance, if you publish lots
>>>> of small events to publishing bigger events. Throughput wise, comparatively
>>>> bigger events will win (even though if we consider that, small operations
>>>> will be batched in transport level etc.. still one event = one database
>>>> row). So I would suggest, we try out a single sequence flow = single event,
>>>> approach, and from the Spark processing side, we consider one of these big
>>>> rows as multiple rows in Spark. I was first thinking, if UDFs can help in
>>>> splitting a single column to multiple rows, and that is not possible, and
>>>> also, a bit troublesome, considering we have to delete the original data
>>>> table after we concerted it using a script, and not forgetting, we actually
>>>> have to schedule and run a separate script to do this post-processing. So a
>>>> much cleaner way to do this would be, to create a new "relation provider"
>>>> in Spark (which is like a data adapter for their DataFrames), and in our
>>>> relation provider, when we are reading rows, we convert a single row's
>>>> column to multiple rows and return that for processing. So Spark will not
>>>> know, physically it was a single row from the data layer, and it can
>>>> summarize the data and all as usual and write to the target summary tables.
>>>> [1] is our existing implementation of Spark relation provider, which
>>>> directly maps to our DAS analytics tables, we can create the new one
>>>> extending / based on it. So I suggest we try out this approach and see, if
>>>> everyone is okay with it.
>>>>
>>>> [1]
>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>
>>>> Cheers,
>>>> Anjana.
>>>> --
>>>> *Anjana Fernando*
>>>> Senior Technical Lead
>>>> WSO2 Inc. | http://wso2.com
>>>> lean . enterprise . middleware
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "WSO2 Engineering Group" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/a/wso2.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>>
>>
>> --
>> Kasun Indrasiri
>> Software Architect
>> WSO2, Inc.; http://wso2.com
>> lean.enterprise.middleware
>>
>> cell: +94 77 556 5206
>> Blog : http://kasunpanorama.blogspot.com/
>>
>
>
>
> --
> *Isuru Udana*
> Associate Technical Lead
> WSO2 Inc.; http://wso2.com
> email: [email protected] cell: +94 77 3791887
> blog: http://mytecheye.blogspot.com/
>



-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Reply via email to