Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Sinthuja Ragendran Tue, 09 Feb 2016 21:58:21 -0800

Hi Supun,


On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> wrote:

> Hi Sinthuja,
>
> Agree on the possibility of simplifying the json. We also discussed on the
> same matter yesterday, but the complication came up was, by an event in the
> "events" list, payload could be either referenced, or defined
> in-line.(made as it is, so that it can be generalized for other fields as
> well if needed, other than payloads.).
>
In such a case, if we had defined as 'payload': '*payload1**', *we would
> not know if its the actual payload, or a reference to the payload in the
> "payloads" section.
>
> With the suggested format, DAS will only go and map the payload if its
> null.
>
>
IMHO we could solve this issue as having conversions. Basically we could
use $payloads:payload1 to reference the elements as a convention. If the
element starts with '$' then it's the reference, not the actual payload. In
that case if there is a new element introduced, let's say foo and you need
to access the property property1, then it will have the reference as
$foo:property1.

Thanks,
Sinthuja.



> Regards,
> Supun
>
> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]>
> wrote:
>
>> Hi Supun,
>>
>> I think we could simplify the json message bit more. Instead of 'null'
>> for the payload attributes in the events section, you could use the actual
>> payload name directly if there is a payload for that event. And in that
>> case, we could eliminate the 'events' section from the 'payloads' section.
>> For the given example, it could be altered as below.
>>
>> {
>> 'events': [{
>> 'messageId': 'aaa',
>> 'componentId': '111',
>> 'payload': '*payload1*',
>> 'componentName': 'Proxy:TestProxy',
>> 'output-payload':null
>> }, {
>> 'messageId': 'bbb',
>> 'componentId': '222',
>> 'componentName': 'Proxy:TestProxy',
>> 'payload': '*payload1*',
>> 'output-payload':null
>> }, {
>> 'messageId': 'ccc',
>> 'componentId': '789',
>> 'payload': '*payload2*',
>> 'componentName': 'Proxy:TestProxy',
>> 'output-payload':'*payload2*'
>> }],
>>
>> 'payloads': {
>> '*payload1*': 'xml-payload-1',
>> '*payload2*': 'xml-payload-2',
>> }
>> }
>>
>> Thanks,
>> Sinthuja.
>>
>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi Budhdhima/Viraj,
>>>
>>> As per the discussion we had yesterday, follow is the format of the json
>>> contains aggregated event details, to be sent to DAS. (you may change the
>>> attribute names of events).
>>>
>>> To explain it further, "events" contains the details about each event
>>> sent by each mediator. Payload may or may not be populated. "Payloads"
>>> section contains unique payloads and the mapping to the events their
>>> fields. (eg:  'xml-payload-2' maps to the 'payload' and 'output-payload'
>>> fields of the 3rd event).
>>>
>>> {
>>> 'events': [{
>>> 'messageId': 'aaa',
>>> 'componentId': '111',
>>> 'payload': null,
>>>
>> 'componentName': 'Proxy:TestProxy',
>>> 'output-payload':null
>>> }, {
>>> 'messageId': 'bbb',
>>> 'componentId': '222',
>>> 'componentName': 'Proxy:TestProxy',
>>> 'payload': null,
>>> 'output-payload':null
>>> }, {
>>> 'messageId': 'ccc',
>>> 'componentId': '789',
>>> 'payload': null,
>>> 'componentName': 'Proxy:TestProxy',
>>> 'output-payload':null
>>> }],
>>>
>>> 'payloads': [{
>>> 'payload': 'xml-payload-1',
>>> 'events': [{
>>> 'eventIndex': 0,
>>> 'attributes':['payload']
>>> }, {
>>> 'eventIndex': 1,
>>> 'attributes':['payload']
>>> }]
>>> }, {
>>> 'payload': 'xml-payload-2',
>>> 'events': [{
>>> 'eventIndex': 2,
>>> 'attributes':['payload','output-payload']
>>> }]
>>> }]
>>> }
>>>
>>> Please let us know any further clarifications is needed, or if there's
>>> anything to be modified/improved.
>>>
>>> Thanks,
>>> Supun
>>>
>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote:
>>>
>>>> Hi Kasun,
>>>>
>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]>
>>>> wrote:
>>>>
>>>>> I think for trancing use case we need to publish events one by one
>>>>> from each mediator (we can't aggregate all such events as it also contains
>>>>> the message payload)
>>>>>
>>>> I think we can still do that with some extra effort.
>>>> Most of the mediators in a sequence flow does not alter the message
>>>> payload. We can store the payload only for the mediators which alter the
>>>> message payload. And for others, we can put a reference to the previous
>>>> entry. By doing that we can save the memory to a great extent.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Supun Sethunga <[email protected]>
>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM
>>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism
>>>>> To: Anjana Fernando <[email protected]>
>>>>> Cc: "[email protected]" <[email protected]>,
>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana <
>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru Udana <
>>>>> [email protected]>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Ran some simple performance tests against the new relational provider,
>>>>> in comparison with the existing one. Follow are the results:
>>>>>
>>>>> *Records in Backend DB Table*: *1,054,057*
>>>>>
>>>>> *Conversion:*
>>>>> Spark Table
>>>>> id a b c
>>>>> Backend DB Table 1 xxx yyy zzz
>>>>> id data 1 ppp qqq rrr
>>>>> 1
>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>  --
>>>>> To --> 1 aaa bbb ccc
>>>>> 2
>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>> 2 xxx yyy zzz
>>>>> 2 aaa bbb ccc
>>>>> 2 ppp qqq rrr
>>>>>
>>>>>
>>>>>
>>>>> *Avg Time for Query Execution:*
>>>>>
>>>>> Querry
>>>>> Execution time (~ sec)
>>>>> Existing Analytics Relation Provider New (ESB) Analytics Relation
>>>>> Provider* * New relational provider split a single row to multiple
>>>>> rows. Hence the number of rows in the table equivalent to 3 times (as each
>>>>> row is split to 3 rows) as the original table.
>>>>> SELECT COUNT(*) FROM <Table>; 13 16
>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16
>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id
>>>>> ASC; 18 26
>>>>>
>>>>> Regards,
>>>>> Supun
>>>>>
>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have started working on implementing a new "relation" / "relation
>>>>>> provider", to serve the above requirement. This basically is a modified
>>>>>> version of the existing "Carbon Analytics" relation provider.
>>>>>>
>>>>>> Here I have assumed that the encapsulated data for a single execution
>>>>>> flow are stored in a single row, and the data about the mediators
>>>>>> invoked during the flow are stored in a known column of each row (say
>>>>>> "data"), as an array (say a json array). When each row is read in to 
>>>>>> spark,
>>>>>> this relational provider create separate rows for each of the element in
>>>>>> the array stored in "data" column. I have tested this with some mocked
>>>>>> data, and works as expected.
>>>>>>
>>>>>> Need to test with the real data/data-formats, and modify the mapping
>>>>>> accordingly. Will update the thread with the details.
>>>>>>
>>>>>> Regards,
>>>>>> Supun
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that,
>>>>>>> for their tracing mechanism, they were instructed to publish one event 
>>>>>>> for
>>>>>>> each of the mediator invocations, where, earlier they had an approach, 
>>>>>>> they
>>>>>>> publish one event, which encapsulated data of a whole execution flow. I
>>>>>>> would actually like to support the latter approach, mainly due to
>>>>>>> performance / resource requirements. And also considering the fact, 
>>>>>>> this is
>>>>>>> a feature that could be enabled in production. So simply, if we do one
>>>>>>> event per mediator, this does not scale that well. For example, if the 
>>>>>>> ESB
>>>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS 
>>>>>>> for
>>>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a 
>>>>>>> DAS
>>>>>>> cluster with a single backend database, this maybe too many rows per 
>>>>>>> second
>>>>>>> written to the database. Where the main problem here is, one event is, a
>>>>>>> single row/record in the backend database in DAS, so it may come to a
>>>>>>> state, where the frequency of row creations by events coming from ESBs
>>>>>>> cannot be sustained.
>>>>>>>
>>>>>>> If we create a single event from the 20 mediators, then it is just
>>>>>>> 1k TPS for DAS event receivers and the database too, event though the
>>>>>>> message size is bigger. It is not necessarily same performance, if you
>>>>>>> publish lots of small events to publishing bigger events. Throughput 
>>>>>>> wise,
>>>>>>> comparatively bigger events will win (even though if we consider that,
>>>>>>> small operations will be batched in transport level etc.. still one 
>>>>>>> event =
>>>>>>> one database row). So I would suggest, we try out a single sequence 
>>>>>>> flow =
>>>>>>> single event, approach, and from the Spark processing side, we consider 
>>>>>>> one
>>>>>>> of these big rows as multiple rows in Spark. I was first thinking, if 
>>>>>>> UDFs
>>>>>>> can help in splitting a single column to multiple rows, and that is not
>>>>>>> possible, and also, a bit troublesome, considering we have to delete the
>>>>>>> original data table after we concerted it using a script, and not
>>>>>>> forgetting, we actually have to schedule and run a separate script to do
>>>>>>> this post-processing. So a much cleaner way to do this would be, to 
>>>>>>> create
>>>>>>> a new "relation provider" in Spark (which is like a data adapter for 
>>>>>>> their
>>>>>>> DataFrames), and in our relation provider, when we are reading rows, we
>>>>>>> convert a single row's column to multiple rows and return that for
>>>>>>> processing. So Spark will not know, physically it was a single row from 
>>>>>>> the
>>>>>>> data layer, and it can summarize the data and all as usual and write to 
>>>>>>> the
>>>>>>> target summary tables. [1] is our existing implementation of Spark 
>>>>>>> relation
>>>>>>> provider, which directly maps to our DAS analytics tables, we can create
>>>>>>> the new one extending / based on it. So I suggest we try out this 
>>>>>>> approach
>>>>>>> and see, if everyone is okay with it.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Anjana.
>>>>>>> --
>>>>>>> *Anjana Fernando*
>>>>>>> Senior Technical Lead
>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>> lean . enterprise . middleware
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "WSO2 Engineering Group" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit
>>>>>>> https://groups.google.com/a/wso2.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kasun Indrasiri
>>>>> Software Architect
>>>>> WSO2, Inc.; http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> cell: +94 77 556 5206
>>>>> Blog : http://kasunpanorama.blogspot.com/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Isuru Udana*
>>>> Associate Technical Lead
>>>> WSO2 Inc.; http://wso2.com
>>>> email: [email protected] cell: +94 77 3791887
>>>> blog: http://mytecheye.blogspot.com/
>>>>
>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Associate Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>



-- 
*Sinthuja Rajendran*
Associate Technical Lead
WSO2, Inc.:http://wso2.com

Blog: http://sinthu-rajan.blogspot.com/
Mobile: +94774273955

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] ESB Analytics Mediation Event Publishing Mechanism

Reply via email to