Hi Sanjiva,

Yes we are indeed using a stream definition and publishing the events using
Thrift.

But in doing so, there were two approaches we considered:

   1. ESB publishing a single event per mediator in a message flow.
   2. ESB publishing a single event per message flow (rather than for each
   mediator in the message flow).

With the perf tests we ran, approach #2 proved to be better than #1 in
terms of performance. The format we have discussed above in the thread, is
the structure of the payload of a wso2event, which contains
aggregated information of mediators. (i.e: how to put all
the information of all mediators in a message flow, in to a single event).
This way we can also get rid of duplicating information as well. (for eg;
mediators like 'log' and 'property' does not change the xml payload for a
single message flow.)

Regards,
Supun

On Wed, Feb 24, 2016 at 11:25 AM, Sanjiva Weerawarana <[email protected]>
wrote:

> Why are we inventing a new event format for this? Why not use a stream
> definition and publish using Thrift?
>
> Sorry if I'm missing something here.
>
> On Fri, Feb 19, 2016 at 11:44 AM, Supun Sethunga <[email protected]> wrote:
>
>> HI,
>>
>> Ran some more performance tests to contrast between publishing Aggregated
>> events Vs Multiple single events, and follow are the results:
>>
>> *Results:*
>>
>> No of concurrent publishers (to DAS): 10
>> Back-end DB: MySQL
>>
>> Single Events Aggregated Events* Single Events Aggregated Events*
>> No of events: 160,000 10,000 1,600,000 100,000
>> Event payload size: 1.9 KB 21.6 KB 1.9 KB 21.6 KB
>> Time Consumed** (mm:ss): 1:55 0:30 19:46 4:31
>>
>> *An aggregated event contains payloads of 16 single events.
>> **Time consumed = time to complete all DB transactions.
>>
>> Please note that these times were monitored while DB trace logs were on.
>> So that too have some effect on the performance in overall.
>>
>> Regards,
>> Supun
>>
>> On Wed, Feb 17, 2016 at 5:37 PM, Viraj Senevirathne <[email protected]>
>> wrote:
>>
>>> Hi All,
>>>
>>> We got a simple sample payload for a actual message flow (attached).
>>>
>>> This have about 16 mediators. The payload file size is ~27.4kB. With
>>> different payload size and large number of mediators in the flow , single
>>> payload size can get even bigger. So if ESB is serving 1000 request per
>>> second, ESB will transfer payloads to DAS with data rate ~27Mb/s. With
>>> large payload sizes and large number of mediators in the flow this data
>>> rate can be go up very high.
>>>
>>> As strings have high repeatably compression works well wtih them. After
>>> compressing above payload its size ~2kB. (93% reduction from original size).
>>>
>>> Large Json File with 1.3MB was reduced to 14.3kB after compression.
>>>
>>> Therefore will it be possible to send compressed json string to DAS
>>> instead of uncompressed one. Then DAS can decompress the file and use the
>>> actual json payload.
>>>
>>> I think this will reduce the data rate drastically and ease data
>>> communication.
>>>
>>> Will it be possible to define new type like "commpressedJSON" to achive
>>> this? WDYT about this idea?
>>>
>>> Thank You,
>>>
>>> On Wed, Feb 17, 2016 at 9:38 AM, Supun Sethunga <[email protected]> wrote:
>>>
>>>> Hi Dushan,
>>>>
>>>> Supun, according to the stream definition ""children": 1," what it
>>>>> represents ?
>>>>
>>>>
>>>> Here, each event basically represent a mediator/proxy. So "children"
>>>> represents the child mediator(s) in the message flow. This info is used to
>>>> draw the message flow diagram.
>>>>
>>>> For eg, if we consider the first event in the array, "children":1 means
>>>> event at index 1 is the first mediator after Test Proxy. and so on.
>>>> Sorry, the values I have put for the "children" in second and third
>>>> events are misleading. They should be  "children":2 and "children":null,
>>>> respectively. So, null means its the end of the message flow.
>>>>
>>>> Regards,
>>>> Supun
>>>>
>>>> On Wed, Feb 17, 2016 at 2:34 AM, Dushan Abeyruwan <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>>    - If we publish events from each mediator then, we can certainly
>>>>>    group each event from unique parentID can't we? (I mean this would 
>>>>> allow us
>>>>>    to prepare a aggregated view per  incoming message and visualize 
>>>>> different
>>>>>    stages of each message representation and other meta information, 
>>>>> think of
>>>>>    complex mediation)
>>>>>    - Can't we record payload as according to Content-Type, therefore,
>>>>>    shall we get rid of SOAP way of representing?
>>>>>    - If we have non-content aware mediation flow with
>>>>>    "application/json", can we find the way to get json string rather 
>>>>> rather
>>>>>    explicitly build  i.e  "org.apache.synapse.commons.json.Constants.
>>>>>    JSON_STRING"
>>>>>    - Supun, according to the stream definition ""children": 1," what
>>>>>    it represents ?
>>>>>
>>>>>
>>>>> On Mon, Feb 15, 2016 at 9:15 PM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Dunith, Gihan,
>>>>>>
>>>>>> As per the offline chat had with Buddhima and Viraj, follow is a
>>>>>> sample payload to be published from ESB to DAS. Do we need any other
>>>>>> information for the plots/tables in dashboard?
>>>>>>
>>>>>> Here we added a new field "entryPoint" to indicate inside which
>>>>>> Proxy/API did the mediator get executed. So that it would be easy to 
>>>>>> drill
>>>>>> down from proxy view to mediator view. Please add if there is any other
>>>>>> similar field that would be needed for drill-downs, if we have missed 
>>>>>> any.
>>>>>>
>>>>>> {
>>>>>> "events": [{
>>>>>> "compotentType": "ProxyService",
>>>>>> "compotentId": "Test Proxy",
>>>>>> "startTime": 1455531027,
>>>>>> "endTime": 1455531041,
>>>>>> "duration": 3.321,
>>>>>> "beforePayload": null,
>>>>>> "afterPayload": null,
>>>>>> "contextPropertyMap":
>>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>>> "children": 1,
>>>>>> "entryPoint": "Test Proxy"
>>>>>> }, {
>>>>>> "compotentType": "Mediator",
>>>>>> "compotentId": "mediator_1",
>>>>>> "startTime": 1455531041,
>>>>>> "endTime": 1455531052,
>>>>>> "duration": 3.321,
>>>>>> "beforePayload": null,
>>>>>> "afterPayload": null,
>>>>>> "contextPropertyMap":
>>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>>> "children": 0,
>>>>>> "entryPoint": "Test Proxy"
>>>>>> }, {
>>>>>> "compotentType": "Mediator",
>>>>>> "compotentId": "mediator_2",
>>>>>> "startTime": 1455531052,
>>>>>> "endTime": 1455531074,
>>>>>> "duration": 3.321,
>>>>>> "beforePayload": null,
>>>>>> "afterPayload": null,
>>>>>> "contextPropertyMap": null,
>>>>>> "transportPropertyMap": null,
>>>>>> "children": 0,
>>>>>> "entryPoint": "Test Proxy"
>>>>>> }],
>>>>>>
>>>>>> "payloads": [{
>>>>>> "payload": "<?xml version=\"1.0\"
>>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>>> xmlns:sam=\"http://sample.esb.org
>>>>>> \"><sam:vehicleNumber>123456</sam:vehicleNumber></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>>> "events": [{
>>>>>> "eventIndex": 0,
>>>>>> "attributes": "beforePayload"
>>>>>> }, {
>>>>>> "eventIndex": 0,
>>>>>> "attributes": "afterPayload"
>>>>>> }, {
>>>>>> "eventIndex": 1,
>>>>>> "attributes": "beforePayload"
>>>>>> }]
>>>>>> }, {
>>>>>> "payload": "<?xml version=\"1.0\"
>>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>>> xmlns:sam=\"http://sample.esb.org
>>>>>> \"><sam:vehicleNumber>123123</sam:vehicleNumber><sam:vehicleType>car</sam:vehicleType></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>>> "events": [{
>>>>>> "eventIndex": 1,
>>>>>> "attributes": "afterPayload"
>>>>>> }, {
>>>>>> "eventIndex": 2,
>>>>>> "attributes": "beforePayload"
>>>>>> }, {
>>>>>> "eventIndex": 2,
>>>>>> "attributes": "afterPayload"
>>>>>> }]
>>>>>> }]
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>> On Wed, Feb 10, 2016 at 11:57 AM, Supun Sethunga <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sinthuja,
>>>>>>>
>>>>>>>
>>>>>>>> IMHO we could solve this issue as having conversions. Basically we
>>>>>>>> could use $payloads:payload1 to reference the elements as a 
>>>>>>>> convention. If
>>>>>>>> the element starts with '$' then it's the reference, not the actual
>>>>>>>> payload. In that case if there is a new element introduced, let's say 
>>>>>>>> foo
>>>>>>>> and you need to access the property property1, then it will have the
>>>>>>>> reference as $foo:property1.
>>>>>>>
>>>>>>>
>>>>>>> Yes, that's possible as well. But again, if the value for the
>>>>>>> property, say 'foo', has an actual value starting with some special
>>>>>>> character.. (in this case '$'), we may run in to ambiguity. (true, the
>>>>>>> chances are pretty less, but still possible).
>>>>>>>
>>>>>>>
>>>>>>>  Also this json event format is being sent as event payload in wso2
>>>>>>>> event, and wso2 event is being published by the data publisher right?
>>>>>>>> Correct me if i'm wrong.
>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Supun
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>> Also this json event format is being sent as event payload in wso2
>>>>>>>> event, and wso2 event is being published by the data publisher right?
>>>>>>>> Correct me if i'm wrong.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sinthuja.
>>>>>>>>
>>>>>>>> On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Supun,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sinthuja,
>>>>>>>>>>
>>>>>>>>>> Agree on the possibility of simplifying the json. We also
>>>>>>>>>> discussed on the same matter yesterday, but the complication came up 
>>>>>>>>>> was,
>>>>>>>>>> by an event in the "events" list, payload could be
>>>>>>>>>> either referenced, or defined in-line.(made as it is, so that it can 
>>>>>>>>>> be
>>>>>>>>>> generalized for other fields as well if needed, other than 
>>>>>>>>>> payloads.).
>>>>>>>>>>
>>>>>>>>> In such a case, if we had defined as 'payload': '*payload1**', *we
>>>>>>>>>> would not know if its the actual payload, or a reference to the 
>>>>>>>>>> payload in
>>>>>>>>>> the "payloads" section.
>>>>>>>>>>
>>>>>>>>>> With the suggested format, DAS will only go and map the payload
>>>>>>>>>> if its null.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> IMHO we could solve this issue as having conversions. Basically we
>>>>>>>>> could use $payloads:payload1 to reference the elements as a 
>>>>>>>>> convention. If
>>>>>>>>> the element starts with '$' then it's the reference, not the actual
>>>>>>>>> payload. In that case if there is a new element introduced, let's say 
>>>>>>>>> foo
>>>>>>>>> and you need to access the property property1, then it will have the
>>>>>>>>> reference as $foo:property1.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sinthuja.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Supun
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>
>>>>>>>>>>> I think we could simplify the json message bit more. Instead of
>>>>>>>>>>> 'null' for the payload attributes in the events section, you could 
>>>>>>>>>>> use the
>>>>>>>>>>> actual payload name directly if there is a payload for that event. 
>>>>>>>>>>> And in
>>>>>>>>>>> that case, we could eliminate the 'events' section from the 
>>>>>>>>>>> 'payloads'
>>>>>>>>>>> section. For the given example, it could be altered as below.
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> 'events': [{
>>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>>> 'componentId': '111',
>>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>> }, {
>>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>>> 'componentId': '222',
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>> }, {
>>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>>> 'componentId': '789',
>>>>>>>>>>> 'payload': '*payload2*',
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'output-payload':'*payload2*'
>>>>>>>>>>> }],
>>>>>>>>>>>
>>>>>>>>>>> 'payloads': {
>>>>>>>>>>> '*payload1*': 'xml-payload-1',
>>>>>>>>>>> '*payload2*': 'xml-payload-2',
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sinthuja.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Budhdhima/Viraj,
>>>>>>>>>>>>
>>>>>>>>>>>> As per the discussion we had yesterday, follow is the format of
>>>>>>>>>>>> the json contains aggregated event details, to be sent to DAS. 
>>>>>>>>>>>> (you may
>>>>>>>>>>>> change the attribute names of events).
>>>>>>>>>>>>
>>>>>>>>>>>> To explain it further, "events" contains the details about each
>>>>>>>>>>>> event sent by each mediator. Payload may or may not be populated.
>>>>>>>>>>>> "Payloads" section contains unique payloads and the mapping to the 
>>>>>>>>>>>> events
>>>>>>>>>>>> their fields. (eg:  'xml-payload-2' maps to the 'payload' and
>>>>>>>>>>>> 'output-payload' fields of the 3rd event).
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>>>> 'componentId': '111',
>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>>
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>> }, {
>>>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>>>> 'componentId': '222',
>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>> }, {
>>>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>>>> 'componentId': '789',
>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>> }],
>>>>>>>>>>>>
>>>>>>>>>>>> 'payloads': [{
>>>>>>>>>>>> 'payload': 'xml-payload-1',
>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>> 'eventIndex': 0,
>>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>>> }, {
>>>>>>>>>>>> 'eventIndex': 1,
>>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>>> }]
>>>>>>>>>>>> }, {
>>>>>>>>>>>> 'payload': 'xml-payload-2',
>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>> 'eventIndex': 2,
>>>>>>>>>>>> 'attributes':['payload','output-payload']
>>>>>>>>>>>> }]
>>>>>>>>>>>> }]
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Please let us know any further clarifications is needed, or if
>>>>>>>>>>>> there's anything to be modified/improved.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Supun
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Kasun,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think for trancing use case we need to publish events one
>>>>>>>>>>>>>> by one from each mediator (we can't aggregate all such events as 
>>>>>>>>>>>>>> it also
>>>>>>>>>>>>>> contains the message payload)
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we can still do that with some extra effort.
>>>>>>>>>>>>> Most of the mediators in a sequence flow does not alter the
>>>>>>>>>>>>> message payload. We can store the payload only for the mediators 
>>>>>>>>>>>>> which
>>>>>>>>>>>>> alter the message payload. And for others, we can put a reference 
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> previous entry. By doing that we can save the memory to a great 
>>>>>>>>>>>>> extent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>>> From: Supun Sethunga <[email protected]>
>>>>>>>>>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM
>>>>>>>>>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing
>>>>>>>>>>>>>> Mechanism
>>>>>>>>>>>>>> To: Anjana Fernando <[email protected]>
>>>>>>>>>>>>>> Cc: "[email protected]" <[email protected]>,
>>>>>>>>>>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana <
>>>>>>>>>>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru
>>>>>>>>>>>>>> Udana <[email protected]>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ran some simple performance tests against the new relational
>>>>>>>>>>>>>> provider, in comparison with the existing one. Follow are the 
>>>>>>>>>>>>>> results:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Records in Backend DB Table*: *1,054,057*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Conversion:*
>>>>>>>>>>>>>> Spark Table
>>>>>>>>>>>>>> id a b c
>>>>>>>>>>>>>> Backend DB Table 1 xxx yyy zzz
>>>>>>>>>>>>>> id data 1 ppp qqq rrr
>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>> To --> 1 aaa bbb ccc
>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>>> 2 xxx yyy zzz
>>>>>>>>>>>>>> 2 aaa bbb ccc
>>>>>>>>>>>>>> 2 ppp qqq rrr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Avg Time for Query Execution:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Querry
>>>>>>>>>>>>>> Execution time (~ sec)
>>>>>>>>>>>>>> Existing Analytics Relation Provider New (ESB) Analytics
>>>>>>>>>>>>>> Relation Provider* * New relational provider split a single
>>>>>>>>>>>>>> row to multiple rows. Hence the number of rows in the table 
>>>>>>>>>>>>>> equivalent to 3
>>>>>>>>>>>>>> times (as each row is split to 3 rows) as the original table.
>>>>>>>>>>>>>> SELECT COUNT(*) FROM <Table>; 13 16
>>>>>>>>>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>>>>>>>>>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16
>>>>>>>>>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a
>>>>>>>>>>>>>> ORDER BY id ASC; 18 26
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have started working on implementing a new "relation" / 
>>>>>>>>>>>>>>> "relation
>>>>>>>>>>>>>>> provider", to serve the above requirement. This basically is a 
>>>>>>>>>>>>>>> modified
>>>>>>>>>>>>>>> version of the existing "Carbon Analytics" relation provider.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here I have assumed that the encapsulated data for a single 
>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>> flow are stored in a single row, and the data about the
>>>>>>>>>>>>>>> mediators invoked during the flow are stored in a known column 
>>>>>>>>>>>>>>> of each row
>>>>>>>>>>>>>>> (say "data"), as an array (say a json array). When each row is 
>>>>>>>>>>>>>>> read in to
>>>>>>>>>>>>>>> spark, this relational provider create separate rows for each 
>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>> element in the array stored in "data" column. I have tested 
>>>>>>>>>>>>>>> this with some
>>>>>>>>>>>>>>> mocked data, and works as expected.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Need to test with the real data/data-formats, and modify the
>>>>>>>>>>>>>>> mapping accordingly. Will update the thread with the details.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know
>>>>>>>>>>>>>>>> that, for their tracing mechanism, they were instructed to 
>>>>>>>>>>>>>>>> publish one
>>>>>>>>>>>>>>>> event for each of the mediator invocations, where, earlier 
>>>>>>>>>>>>>>>> they had an
>>>>>>>>>>>>>>>> approach, they publish one event, which encapsulated data of a 
>>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>> execution flow. I would actually like to support the latter 
>>>>>>>>>>>>>>>> approach,
>>>>>>>>>>>>>>>> mainly due to performance / resource requirements. And also 
>>>>>>>>>>>>>>>> considering the
>>>>>>>>>>>>>>>> fact, this is a feature that could be enabled in production. 
>>>>>>>>>>>>>>>> So simply, if
>>>>>>>>>>>>>>>> we do one event per mediator, this does not scale that well. 
>>>>>>>>>>>>>>>> For example,
>>>>>>>>>>>>>>>> if the ESB is doing 1k TPS, for a sequence that has 20 
>>>>>>>>>>>>>>>> mediators, that is
>>>>>>>>>>>>>>>> 20k TPS for analytics traffic. Combine that with a possible 
>>>>>>>>>>>>>>>> ESB cluster
>>>>>>>>>>>>>>>> hitting a DAS cluster with a single backend database, this 
>>>>>>>>>>>>>>>> maybe too many
>>>>>>>>>>>>>>>> rows per second written to the database. Where the main 
>>>>>>>>>>>>>>>> problem here is,
>>>>>>>>>>>>>>>> one event is, a single row/record in the backend database in 
>>>>>>>>>>>>>>>> DAS, so it may
>>>>>>>>>>>>>>>> come to a state, where the frequency of row creations by 
>>>>>>>>>>>>>>>> events coming from
>>>>>>>>>>>>>>>> ESBs cannot be sustained.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we create a single event from the 20 mediators, then it
>>>>>>>>>>>>>>>> is just 1k TPS for DAS event receivers and the database too, 
>>>>>>>>>>>>>>>> event though
>>>>>>>>>>>>>>>> the message size is bigger. It is not necessarily same 
>>>>>>>>>>>>>>>> performance, if you
>>>>>>>>>>>>>>>> publish lots of small events to publishing bigger events. 
>>>>>>>>>>>>>>>> Throughput wise,
>>>>>>>>>>>>>>>> comparatively bigger events will win (even though if we 
>>>>>>>>>>>>>>>> consider that,
>>>>>>>>>>>>>>>> small operations will be batched in transport level etc.. 
>>>>>>>>>>>>>>>> still one event =
>>>>>>>>>>>>>>>> one database row). So I would suggest, we try out a single 
>>>>>>>>>>>>>>>> sequence flow =
>>>>>>>>>>>>>>>> single event, approach, and from the Spark processing side, we 
>>>>>>>>>>>>>>>> consider one
>>>>>>>>>>>>>>>> of these big rows as multiple rows in Spark. I was first 
>>>>>>>>>>>>>>>> thinking, if UDFs
>>>>>>>>>>>>>>>> can help in splitting a single column to multiple rows, and 
>>>>>>>>>>>>>>>> that is not
>>>>>>>>>>>>>>>> possible, and also, a bit troublesome, considering we have to 
>>>>>>>>>>>>>>>> delete the
>>>>>>>>>>>>>>>> original data table after we concerted it using a script, and 
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> forgetting, we actually have to schedule and run a separate 
>>>>>>>>>>>>>>>> script to do
>>>>>>>>>>>>>>>> this post-processing. So a much cleaner way to do this would 
>>>>>>>>>>>>>>>> be, to create
>>>>>>>>>>>>>>>> a new "relation provider" in Spark (which is like a data 
>>>>>>>>>>>>>>>> adapter for their
>>>>>>>>>>>>>>>> DataFrames), and in our relation provider, when we are reading 
>>>>>>>>>>>>>>>> rows, we
>>>>>>>>>>>>>>>> convert a single row's column to multiple rows and return that 
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> processing. So Spark will not know, physically it was a single 
>>>>>>>>>>>>>>>> row from the
>>>>>>>>>>>>>>>> data layer, and it can summarize the data and all as usual and 
>>>>>>>>>>>>>>>> write to the
>>>>>>>>>>>>>>>> target summary tables. [1] is our existing implementation of 
>>>>>>>>>>>>>>>> Spark relation
>>>>>>>>>>>>>>>> provider, which directly maps to our DAS analytics tables, we 
>>>>>>>>>>>>>>>> can create
>>>>>>>>>>>>>>>> the new one extending / based on it. So I suggest we try out 
>>>>>>>>>>>>>>>> this approach
>>>>>>>>>>>>>>>> and see, if everyone is okay with it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>>>> Google Groups "WSO2 Engineering Group" group.
>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>>> from it, send an email to
>>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>>> For more options, visit
>>>>>>>>>>>>>>>> https://groups.google.com/a/wso2.com/d/optout.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Kasun Indrasiri
>>>>>>>>>>>>>> Software Architect
>>>>>>>>>>>>>> WSO2, Inc.; http://wso2.com
>>>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cell: +94 77 556 5206
>>>>>>>>>>>>>> Blog : http://kasunpanorama.blogspot.com/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Isuru Udana*
>>>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>>>>>> email: [email protected] cell: +94 77 3791887
>>>>>>>>>>>>> blog: http://mytecheye.blogspot.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>>>> Mobile: +94774273955
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>> Software Engineer
>>>>>>>>>> WSO2, Inc.
>>>>>>>>>> http://wso2.com/
>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>> Associate Technical Lead
>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>
>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>> Mobile: +94774273955
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Sinthuja Rajendran*
>>>>>>>> Associate Technical Lead
>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>
>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>> Mobile: +94774273955
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dushan Abeyruwan | Technical Lead
>>>>>
>>>>> PMC Member Apache Synpase
>>>>> WSO2 Inc. http://wso2.com/
>>>>> Blog:*http://www.dushantech.com/ <http://www.dushantech.com/>*
>>>>> Mobile:(001)408-791-9312
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Viraj Senevirathne
>>> Software Engineer; WSO2, Inc.
>>>
>>> Mobile : +94 71 958 0269
>>> Email : [email protected]
>>>
>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Sanjiva Weerawarana, Ph.D.
> Founder, CEO & Chief Architect; WSO2, Inc.;  http://wso2.com/
> email: [email protected]; office: (+1 650 745 4499 | +94  11 214 5345)
> x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
> Lean . Enterprise . Middleware
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to