Why are we inventing a new event format for this? Why not use a stream
definition and publish using Thrift?

Sorry if I'm missing something here.

On Fri, Feb 19, 2016 at 11:44 AM, Supun Sethunga <[email protected]> wrote:

> HI,
>
> Ran some more performance tests to contrast between publishing Aggregated
> events Vs Multiple single events, and follow are the results:
>
> *Results:*
>
> No of concurrent publishers (to DAS): 10
> Back-end DB: MySQL
>
> Single Events Aggregated Events* Single Events Aggregated Events*
> No of events: 160,000 10,000 1,600,000 100,000
> Event payload size: 1.9 KB 21.6 KB 1.9 KB 21.6 KB
> Time Consumed** (mm:ss): 1:55 0:30 19:46 4:31
>
> *An aggregated event contains payloads of 16 single events.
> **Time consumed = time to complete all DB transactions.
>
> Please note that these times were monitored while DB trace logs were on.
> So that too have some effect on the performance in overall.
>
> Regards,
> Supun
>
> On Wed, Feb 17, 2016 at 5:37 PM, Viraj Senevirathne <[email protected]>
> wrote:
>
>> Hi All,
>>
>> We got a simple sample payload for a actual message flow (attached).
>>
>> This have about 16 mediators. The payload file size is ~27.4kB. With
>> different payload size and large number of mediators in the flow , single
>> payload size can get even bigger. So if ESB is serving 1000 request per
>> second, ESB will transfer payloads to DAS with data rate ~27Mb/s. With
>> large payload sizes and large number of mediators in the flow this data
>> rate can be go up very high.
>>
>> As strings have high repeatably compression works well wtih them. After
>> compressing above payload its size ~2kB. (93% reduction from original size).
>>
>> Large Json File with 1.3MB was reduced to 14.3kB after compression.
>>
>> Therefore will it be possible to send compressed json string to DAS
>> instead of uncompressed one. Then DAS can decompress the file and use the
>> actual json payload.
>>
>> I think this will reduce the data rate drastically and ease data
>> communication.
>>
>> Will it be possible to define new type like "commpressedJSON" to achive
>> this? WDYT about this idea?
>>
>> Thank You,
>>
>> On Wed, Feb 17, 2016 at 9:38 AM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi Dushan,
>>>
>>> Supun, according to the stream definition ""children": 1," what it
>>>> represents ?
>>>
>>>
>>> Here, each event basically represent a mediator/proxy. So "children"
>>> represents the child mediator(s) in the message flow. This info is used to
>>> draw the message flow diagram.
>>>
>>> For eg, if we consider the first event in the array, "children":1 means
>>> event at index 1 is the first mediator after Test Proxy. and so on.
>>> Sorry, the values I have put for the "children" in second and third
>>> events are misleading. They should be  "children":2 and "children":null,
>>> respectively. So, null means its the end of the message flow.
>>>
>>> Regards,
>>> Supun
>>>
>>> On Wed, Feb 17, 2016 at 2:34 AM, Dushan Abeyruwan <[email protected]>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>>    - If we publish events from each mediator then, we can certainly
>>>>    group each event from unique parentID can't we? (I mean this would 
>>>> allow us
>>>>    to prepare a aggregated view per  incoming message and visualize 
>>>> different
>>>>    stages of each message representation and other meta information, think 
>>>> of
>>>>    complex mediation)
>>>>    - Can't we record payload as according to Content-Type, therefore,
>>>>    shall we get rid of SOAP way of representing?
>>>>    - If we have non-content aware mediation flow with
>>>>    "application/json", can we find the way to get json string rather rather
>>>>    explicitly build  i.e  "org.apache.synapse.commons.json.Constants.
>>>>    JSON_STRING"
>>>>    - Supun, according to the stream definition ""children": 1," what
>>>>    it represents ?
>>>>
>>>>
>>>> On Mon, Feb 15, 2016 at 9:15 PM, Supun Sethunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Dunith, Gihan,
>>>>>
>>>>> As per the offline chat had with Buddhima and Viraj, follow is a
>>>>> sample payload to be published from ESB to DAS. Do we need any other
>>>>> information for the plots/tables in dashboard?
>>>>>
>>>>> Here we added a new field "entryPoint" to indicate inside which
>>>>> Proxy/API did the mediator get executed. So that it would be easy to drill
>>>>> down from proxy view to mediator view. Please add if there is any other
>>>>> similar field that would be needed for drill-downs, if we have missed any.
>>>>>
>>>>> {
>>>>> "events": [{
>>>>> "compotentType": "ProxyService",
>>>>> "compotentId": "Test Proxy",
>>>>> "startTime": 1455531027,
>>>>> "endTime": 1455531041,
>>>>> "duration": 3.321,
>>>>> "beforePayload": null,
>>>>> "afterPayload": null,
>>>>> "contextPropertyMap":
>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>> "children": 1,
>>>>> "entryPoint": "Test Proxy"
>>>>> }, {
>>>>> "compotentType": "Mediator",
>>>>> "compotentId": "mediator_1",
>>>>> "startTime": 1455531041,
>>>>> "endTime": 1455531052,
>>>>> "duration": 3.321,
>>>>> "beforePayload": null,
>>>>> "afterPayload": null,
>>>>> "contextPropertyMap":
>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>> "children": 0,
>>>>> "entryPoint": "Test Proxy"
>>>>> }, {
>>>>> "compotentType": "Mediator",
>>>>> "compotentId": "mediator_2",
>>>>> "startTime": 1455531052,
>>>>> "endTime": 1455531074,
>>>>> "duration": 3.321,
>>>>> "beforePayload": null,
>>>>> "afterPayload": null,
>>>>> "contextPropertyMap": null,
>>>>> "transportPropertyMap": null,
>>>>> "children": 0,
>>>>> "entryPoint": "Test Proxy"
>>>>> }],
>>>>>
>>>>> "payloads": [{
>>>>> "payload": "<?xml version=\"1.0\"
>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>> xmlns:sam=\"http://sample.esb.org
>>>>> \"><sam:vehicleNumber>123456</sam:vehicleNumber></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>> "events": [{
>>>>> "eventIndex": 0,
>>>>> "attributes": "beforePayload"
>>>>> }, {
>>>>> "eventIndex": 0,
>>>>> "attributes": "afterPayload"
>>>>> }, {
>>>>> "eventIndex": 1,
>>>>> "attributes": "beforePayload"
>>>>> }]
>>>>> }, {
>>>>> "payload": "<?xml version=\"1.0\"
>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>> xmlns:sam=\"http://sample.esb.org
>>>>> \"><sam:vehicleNumber>123123</sam:vehicleNumber><sam:vehicleType>car</sam:vehicleType></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>> "events": [{
>>>>> "eventIndex": 1,
>>>>> "attributes": "afterPayload"
>>>>> }, {
>>>>> "eventIndex": 2,
>>>>> "attributes": "beforePayload"
>>>>> }, {
>>>>> "eventIndex": 2,
>>>>> "attributes": "afterPayload"
>>>>> }]
>>>>> }]
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Supun
>>>>>
>>>>> On Wed, Feb 10, 2016 at 11:57 AM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Sinthuja,
>>>>>>
>>>>>>
>>>>>>> IMHO we could solve this issue as having conversions. Basically we
>>>>>>> could use $payloads:payload1 to reference the elements as a convention. 
>>>>>>> If
>>>>>>> the element starts with '$' then it's the reference, not the actual
>>>>>>> payload. In that case if there is a new element introduced, let's say 
>>>>>>> foo
>>>>>>> and you need to access the property property1, then it will have the
>>>>>>> reference as $foo:property1.
>>>>>>
>>>>>>
>>>>>> Yes, that's possible as well. But again, if the value for the
>>>>>> property, say 'foo', has an actual value starting with some special
>>>>>> character.. (in this case '$'), we may run in to ambiguity. (true, the
>>>>>> chances are pretty less, but still possible).
>>>>>>
>>>>>>
>>>>>>  Also this json event format is being sent as event payload in wso2
>>>>>>> event, and wso2 event is being published by the data publisher right?
>>>>>>> Correct me if i'm wrong.
>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Supun,
>>>>>>>
>>>>>>> Also this json event format is being sent as event payload in wso2
>>>>>>> event, and wso2 event is being published by the data publisher right?
>>>>>>> Correct me if i'm wrong.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sinthuja.
>>>>>>>
>>>>>>> On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sinthuja,
>>>>>>>>>
>>>>>>>>> Agree on the possibility of simplifying the json. We also
>>>>>>>>> discussed on the same matter yesterday, but the complication came up 
>>>>>>>>> was,
>>>>>>>>> by an event in the "events" list, payload could be
>>>>>>>>> either referenced, or defined in-line.(made as it is, so that it can 
>>>>>>>>> be
>>>>>>>>> generalized for other fields as well if needed, other than payloads.).
>>>>>>>>>
>>>>>>>> In such a case, if we had defined as 'payload': '*payload1**', *we
>>>>>>>>> would not know if its the actual payload, or a reference to the 
>>>>>>>>> payload in
>>>>>>>>> the "payloads" section.
>>>>>>>>>
>>>>>>>>> With the suggested format, DAS will only go and map the payload if
>>>>>>>>> its null.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IMHO we could solve this issue as having conversions. Basically we
>>>>>>>> could use $payloads:payload1 to reference the elements as a 
>>>>>>>> convention. If
>>>>>>>> the element starts with '$' then it's the reference, not the actual
>>>>>>>> payload. In that case if there is a new element introduced, let's say 
>>>>>>>> foo
>>>>>>>> and you need to access the property property1, then it will have the
>>>>>>>> reference as $foo:property1.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sinthuja.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Supun,
>>>>>>>>>>
>>>>>>>>>> I think we could simplify the json message bit more. Instead of
>>>>>>>>>> 'null' for the payload attributes in the events section, you could 
>>>>>>>>>> use the
>>>>>>>>>> actual payload name directly if there is a payload for that event. 
>>>>>>>>>> And in
>>>>>>>>>> that case, we could eliminate the 'events' section from the 
>>>>>>>>>> 'payloads'
>>>>>>>>>> section. For the given example, it could be altered as below.
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> 'events': [{
>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>> 'componentId': '111',
>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>> 'output-payload':null
>>>>>>>>>> }, {
>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>> 'componentId': '222',
>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>> 'output-payload':null
>>>>>>>>>> }, {
>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>> 'componentId': '789',
>>>>>>>>>> 'payload': '*payload2*',
>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>> 'output-payload':'*payload2*'
>>>>>>>>>> }],
>>>>>>>>>>
>>>>>>>>>> 'payloads': {
>>>>>>>>>> '*payload1*': 'xml-payload-1',
>>>>>>>>>> '*payload2*': 'xml-payload-2',
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sinthuja.
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Budhdhima/Viraj,
>>>>>>>>>>>
>>>>>>>>>>> As per the discussion we had yesterday, follow is the format of
>>>>>>>>>>> the json contains aggregated event details, to be sent to DAS. (you 
>>>>>>>>>>> may
>>>>>>>>>>> change the attribute names of events).
>>>>>>>>>>>
>>>>>>>>>>> To explain it further, "events" contains the details about each
>>>>>>>>>>> event sent by each mediator. Payload may or may not be populated.
>>>>>>>>>>> "Payloads" section contains unique payloads and the mapping to the 
>>>>>>>>>>> events
>>>>>>>>>>> their fields. (eg:  'xml-payload-2' maps to the 'payload' and
>>>>>>>>>>> 'output-payload' fields of the 3rd event).
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> 'events': [{
>>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>>> 'componentId': '111',
>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>
>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>> }, {
>>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>>> 'componentId': '222',
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'payload': null,
>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>> }, {
>>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>>> 'componentId': '789',
>>>>>>>>>>> 'payload': null,
>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>> }],
>>>>>>>>>>>
>>>>>>>>>>> 'payloads': [{
>>>>>>>>>>> 'payload': 'xml-payload-1',
>>>>>>>>>>> 'events': [{
>>>>>>>>>>> 'eventIndex': 0,
>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>> }, {
>>>>>>>>>>> 'eventIndex': 1,
>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>> }]
>>>>>>>>>>> }, {
>>>>>>>>>>> 'payload': 'xml-payload-2',
>>>>>>>>>>> 'events': [{
>>>>>>>>>>> 'eventIndex': 2,
>>>>>>>>>>> 'attributes':['payload','output-payload']
>>>>>>>>>>> }]
>>>>>>>>>>> }]
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Please let us know any further clarifications is needed, or if
>>>>>>>>>>> there's anything to be modified/improved.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Supun
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Kasun,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I think for trancing use case we need to publish events one by
>>>>>>>>>>>>> one from each mediator (we can't aggregate all such events as it 
>>>>>>>>>>>>> also
>>>>>>>>>>>>> contains the message payload)
>>>>>>>>>>>>>
>>>>>>>>>>>> I think we can still do that with some extra effort.
>>>>>>>>>>>> Most of the mediators in a sequence flow does not alter the
>>>>>>>>>>>> message payload. We can store the payload only for the mediators 
>>>>>>>>>>>> which
>>>>>>>>>>>> alter the message payload. And for others, we can put a reference 
>>>>>>>>>>>> to the
>>>>>>>>>>>> previous entry. By doing that we can save the memory to a great 
>>>>>>>>>>>> extent.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>> From: Supun Sethunga <[email protected]>
>>>>>>>>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM
>>>>>>>>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism
>>>>>>>>>>>>> To: Anjana Fernando <[email protected]>
>>>>>>>>>>>>> Cc: "[email protected]" <[email protected]>,
>>>>>>>>>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana <
>>>>>>>>>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru
>>>>>>>>>>>>> Udana <[email protected]>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ran some simple performance tests against the new relational
>>>>>>>>>>>>> provider, in comparison with the existing one. Follow are the 
>>>>>>>>>>>>> results:
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Records in Backend DB Table*: *1,054,057*
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Conversion:*
>>>>>>>>>>>>> Spark Table
>>>>>>>>>>>>> id a b c
>>>>>>>>>>>>> Backend DB Table 1 xxx yyy zzz
>>>>>>>>>>>>> id data 1 ppp qqq rrr
>>>>>>>>>>>>> 1
>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>>  --
>>>>>>>>>>>>> To --> 1 aaa bbb ccc
>>>>>>>>>>>>> 2
>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>> 2 xxx yyy zzz
>>>>>>>>>>>>> 2 aaa bbb ccc
>>>>>>>>>>>>> 2 ppp qqq rrr
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Avg Time for Query Execution:*
>>>>>>>>>>>>>
>>>>>>>>>>>>> Querry
>>>>>>>>>>>>> Execution time (~ sec)
>>>>>>>>>>>>> Existing Analytics Relation Provider New (ESB) Analytics
>>>>>>>>>>>>> Relation Provider* * New relational provider split a single
>>>>>>>>>>>>> row to multiple rows. Hence the number of rows in the table 
>>>>>>>>>>>>> equivalent to 3
>>>>>>>>>>>>> times (as each row is split to 3 rows) as the original table.
>>>>>>>>>>>>> SELECT COUNT(*) FROM <Table>; 13 16
>>>>>>>>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>>>>>>>>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16
>>>>>>>>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER
>>>>>>>>>>>>> BY id ASC; 18 26
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have started working on implementing a new "relation" / 
>>>>>>>>>>>>>> "relation
>>>>>>>>>>>>>> provider", to serve the above requirement. This basically is a 
>>>>>>>>>>>>>> modified
>>>>>>>>>>>>>> version of the existing "Carbon Analytics" relation provider.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here I have assumed that the encapsulated data for a single 
>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>> flow are stored in a single row, and the data about the
>>>>>>>>>>>>>> mediators invoked during the flow are stored in a known column 
>>>>>>>>>>>>>> of each row
>>>>>>>>>>>>>> (say "data"), as an array (say a json array). When each row is 
>>>>>>>>>>>>>> read in to
>>>>>>>>>>>>>> spark, this relational provider create separate rows for each of 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> element in the array stored in "data" column. I have tested this 
>>>>>>>>>>>>>> with some
>>>>>>>>>>>>>> mocked data, and works as expected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Need to test with the real data/data-formats, and modify the
>>>>>>>>>>>>>> mapping accordingly. Will update the thread with the details.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know
>>>>>>>>>>>>>>> that, for their tracing mechanism, they were instructed to 
>>>>>>>>>>>>>>> publish one
>>>>>>>>>>>>>>> event for each of the mediator invocations, where, earlier they 
>>>>>>>>>>>>>>> had an
>>>>>>>>>>>>>>> approach, they publish one event, which encapsulated data of a 
>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>> execution flow. I would actually like to support the latter 
>>>>>>>>>>>>>>> approach,
>>>>>>>>>>>>>>> mainly due to performance / resource requirements. And also 
>>>>>>>>>>>>>>> considering the
>>>>>>>>>>>>>>> fact, this is a feature that could be enabled in production. So 
>>>>>>>>>>>>>>> simply, if
>>>>>>>>>>>>>>> we do one event per mediator, this does not scale that well. 
>>>>>>>>>>>>>>> For example,
>>>>>>>>>>>>>>> if the ESB is doing 1k TPS, for a sequence that has 20 
>>>>>>>>>>>>>>> mediators, that is
>>>>>>>>>>>>>>> 20k TPS for analytics traffic. Combine that with a possible ESB 
>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>> hitting a DAS cluster with a single backend database, this 
>>>>>>>>>>>>>>> maybe too many
>>>>>>>>>>>>>>> rows per second written to the database. Where the main problem 
>>>>>>>>>>>>>>> here is,
>>>>>>>>>>>>>>> one event is, a single row/record in the backend database in 
>>>>>>>>>>>>>>> DAS, so it may
>>>>>>>>>>>>>>> come to a state, where the frequency of row creations by events 
>>>>>>>>>>>>>>> coming from
>>>>>>>>>>>>>>> ESBs cannot be sustained.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we create a single event from the 20 mediators, then it
>>>>>>>>>>>>>>> is just 1k TPS for DAS event receivers and the database too, 
>>>>>>>>>>>>>>> event though
>>>>>>>>>>>>>>> the message size is bigger. It is not necessarily same 
>>>>>>>>>>>>>>> performance, if you
>>>>>>>>>>>>>>> publish lots of small events to publishing bigger events. 
>>>>>>>>>>>>>>> Throughput wise,
>>>>>>>>>>>>>>> comparatively bigger events will win (even though if we 
>>>>>>>>>>>>>>> consider that,
>>>>>>>>>>>>>>> small operations will be batched in transport level etc.. still 
>>>>>>>>>>>>>>> one event =
>>>>>>>>>>>>>>> one database row). So I would suggest, we try out a single 
>>>>>>>>>>>>>>> sequence flow =
>>>>>>>>>>>>>>> single event, approach, and from the Spark processing side, we 
>>>>>>>>>>>>>>> consider one
>>>>>>>>>>>>>>> of these big rows as multiple rows in Spark. I was first 
>>>>>>>>>>>>>>> thinking, if UDFs
>>>>>>>>>>>>>>> can help in splitting a single column to multiple rows, and 
>>>>>>>>>>>>>>> that is not
>>>>>>>>>>>>>>> possible, and also, a bit troublesome, considering we have to 
>>>>>>>>>>>>>>> delete the
>>>>>>>>>>>>>>> original data table after we concerted it using a script, and 
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> forgetting, we actually have to schedule and run a separate 
>>>>>>>>>>>>>>> script to do
>>>>>>>>>>>>>>> this post-processing. So a much cleaner way to do this would 
>>>>>>>>>>>>>>> be, to create
>>>>>>>>>>>>>>> a new "relation provider" in Spark (which is like a data 
>>>>>>>>>>>>>>> adapter for their
>>>>>>>>>>>>>>> DataFrames), and in our relation provider, when we are reading 
>>>>>>>>>>>>>>> rows, we
>>>>>>>>>>>>>>> convert a single row's column to multiple rows and return that 
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> processing. So Spark will not know, physically it was a single 
>>>>>>>>>>>>>>> row from the
>>>>>>>>>>>>>>> data layer, and it can summarize the data and all as usual and 
>>>>>>>>>>>>>>> write to the
>>>>>>>>>>>>>>> target summary tables. [1] is our existing implementation of 
>>>>>>>>>>>>>>> Spark relation
>>>>>>>>>>>>>>> provider, which directly maps to our DAS analytics tables, we 
>>>>>>>>>>>>>>> can create
>>>>>>>>>>>>>>> the new one extending / based on it. So I suggest we try out 
>>>>>>>>>>>>>>> this approach
>>>>>>>>>>>>>>> and see, if everyone is okay with it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>>> Google Groups "WSO2 Engineering Group" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>> from it, send an email to
>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>> For more options, visit
>>>>>>>>>>>>>>> https://groups.google.com/a/wso2.com/d/optout.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Kasun Indrasiri
>>>>>>>>>>>>> Software Architect
>>>>>>>>>>>>> WSO2, Inc.; http://wso2.com
>>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>>
>>>>>>>>>>>>> cell: +94 77 556 5206
>>>>>>>>>>>>> Blog : http://kasunpanorama.blogspot.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Isuru Udana*
>>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>>>>> email: [email protected] cell: +94 77 3791887
>>>>>>>>>>>> blog: http://mytecheye.blogspot.com/
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>> Software Engineer
>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>>> Associate Technical Lead
>>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>>
>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>>> Mobile: +94774273955
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Sinthuja Rajendran*
>>>>>>>> Associate Technical Lead
>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>
>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>> Mobile: +94774273955
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Sinthuja Rajendran*
>>>>>>> Associate Technical Lead
>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>
>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>> Mobile: +94774273955
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dushan Abeyruwan | Technical Lead
>>>>
>>>> PMC Member Apache Synpase
>>>> WSO2 Inc. http://wso2.com/
>>>> Blog:*http://www.dushantech.com/ <http://www.dushantech.com/>*
>>>> Mobile:(001)408-791-9312
>>>>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Viraj Senevirathne
>> Software Engineer; WSO2, Inc.
>>
>> Mobile : +94 71 958 0269
>> Email : [email protected]
>>
>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Sanjiva Weerawarana, Ph.D.
Founder, CEO & Chief Architect; WSO2, Inc.;  http://wso2.com/
email: [email protected]; office: (+1 650 745 4499 | +94  11 214 5345)
x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
Lean . Enterprise . Middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to