Hi Sanjiva,

In the current approach also, we do have mediator level information. Though
ESB publishes aggregated info to DAS, using a relational provider
implemented in DAS, we split the aggregated info in to mediator-level info
while retrieving for analyzes. (basically, when we load the published data
to spark inside DAS, it sees all the data as events published per mediator,
not as aggregated events). So a user can view the overall message tracing
(for a flow), as well as can drill down up to a mediator level, in the UI.

But yes, the drawback is, one have to turn on tracing for the whole proxy,
rather than the mediator itself, if they want to trace a mediator.

Regards,
Supun

On Wed, Feb 24, 2016 at 6:52 PM, Sanjiva Weerawarana <[email protected]>
wrote:

> IMO we're worrying about the wrong thing.
>
> No one will ever WANT to trace 1000 TPS unless its tracing for audit
> purposes. If its for tuning/debugging then what we need is APIs that will
> let people dynamically turn on traciny VERY selectively.
>
> I prefer if we log per mediator because then we can do whatever kinds of
> aggregation / analysis we want. For example, maybe what we have is an issue
> with one mediator- then you need to instruct the ESB to trace all calls
> coming to that one mediator so that we can analyze its aggregate behavior.
> The approach that has been suggested takes a one dimensional view (a
> sequence of mediators) of what one might want to analyze.
>
> Sanjiva.
>
> On Wed, Feb 24, 2016 at 2:48 PM, Supun Sethunga <[email protected]> wrote:
>
>> Hi Sanjiva,
>>
>> Yes we are indeed using a stream definition and publishing the events
>> using Thrift.
>>
>> But in doing so, there were two approaches we considered:
>>
>>    1. ESB publishing a single event per mediator in a message flow.
>>    2. ESB publishing a single event per message flow (rather than for
>>    each mediator in the message flow).
>>
>> With the perf tests we ran, approach #2 proved to be better than #1 in
>> terms of performance. The format we have discussed above in the thread, is
>> the structure of the payload of a wso2event, which contains
>> aggregated information of mediators. (i.e: how to put all
>> the information of all mediators in a message flow, in to a single event).
>> This way we can also get rid of duplicating information as well. (for eg;
>> mediators like 'log' and 'property' does not change the xml payload for a
>> single message flow.)
>>
>> Regards,
>> Supun
>>
>> On Wed, Feb 24, 2016 at 11:25 AM, Sanjiva Weerawarana <[email protected]>
>> wrote:
>>
>>> Why are we inventing a new event format for this? Why not use a stream
>>> definition and publish using Thrift?
>>>
>>> Sorry if I'm missing something here.
>>>
>>> On Fri, Feb 19, 2016 at 11:44 AM, Supun Sethunga <[email protected]>
>>> wrote:
>>>
>>>> HI,
>>>>
>>>> Ran some more performance tests to contrast between publishing
>>>> Aggregated events Vs Multiple single events, and follow are the results:
>>>>
>>>> *Results:*
>>>>
>>>> No of concurrent publishers (to DAS): 10
>>>> Back-end DB: MySQL
>>>>
>>>> Single Events Aggregated Events* Single Events Aggregated Events*
>>>> No of events: 160,000 10,000 1,600,000 100,000
>>>> Event payload size: 1.9 KB 21.6 KB 1.9 KB 21.6 KB
>>>> Time Consumed** (mm:ss): 1:55 0:30 19:46 4:31
>>>>
>>>> *An aggregated event contains payloads of 16 single events.
>>>> **Time consumed = time to complete all DB transactions.
>>>>
>>>> Please note that these times were monitored while DB trace logs were
>>>> on. So that too have some effect on the performance in overall.
>>>>
>>>> Regards,
>>>> Supun
>>>>
>>>> On Wed, Feb 17, 2016 at 5:37 PM, Viraj Senevirathne <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We got a simple sample payload for a actual message flow (attached).
>>>>>
>>>>> This have about 16 mediators. The payload file size is ~27.4kB. With
>>>>> different payload size and large number of mediators in the flow , single
>>>>> payload size can get even bigger. So if ESB is serving 1000 request per
>>>>> second, ESB will transfer payloads to DAS with data rate ~27Mb/s. With
>>>>> large payload sizes and large number of mediators in the flow this data
>>>>> rate can be go up very high.
>>>>>
>>>>> As strings have high repeatably compression works well wtih them.
>>>>> After compressing above payload its size ~2kB. (93% reduction from 
>>>>> original
>>>>> size).
>>>>>
>>>>> Large Json File with 1.3MB was reduced to 14.3kB after compression.
>>>>>
>>>>> Therefore will it be possible to send compressed json string to DAS
>>>>> instead of uncompressed one. Then DAS can decompress the file and use the
>>>>> actual json payload.
>>>>>
>>>>> I think this will reduce the data rate drastically and ease data
>>>>> communication.
>>>>>
>>>>> Will it be possible to define new type like "commpressedJSON" to
>>>>> achive this? WDYT about this idea?
>>>>>
>>>>> Thank You,
>>>>>
>>>>> On Wed, Feb 17, 2016 at 9:38 AM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Dushan,
>>>>>>
>>>>>> Supun, according to the stream definition ""children": 1," what it
>>>>>>> represents ?
>>>>>>
>>>>>>
>>>>>> Here, each event basically represent a mediator/proxy. So "children"
>>>>>> represents the child mediator(s) in the message flow. This info is used 
>>>>>> to
>>>>>> draw the message flow diagram.
>>>>>>
>>>>>> For eg, if we consider the first event in the array, "children":1
>>>>>> means event at index 1 is the first mediator after Test Proxy. and
>>>>>> so on.
>>>>>> Sorry, the values I have put for the "children" in second and third
>>>>>> events are misleading. They should be  "children":2 and "children":null,
>>>>>> respectively. So, null means its the end of the message flow.
>>>>>>
>>>>>> Regards,
>>>>>> Supun
>>>>>>
>>>>>> On Wed, Feb 17, 2016 at 2:34 AM, Dushan Abeyruwan <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>    - If we publish events from each mediator then, we can certainly
>>>>>>>    group each event from unique parentID can't we? (I mean this would 
>>>>>>> allow us
>>>>>>>    to prepare a aggregated view per  incoming message and visualize 
>>>>>>> different
>>>>>>>    stages of each message representation and other meta information, 
>>>>>>> think of
>>>>>>>    complex mediation)
>>>>>>>    - Can't we record payload as according to Content-Type,
>>>>>>>    therefore, shall we get rid of SOAP way of representing?
>>>>>>>    - If we have non-content aware mediation flow with
>>>>>>>    "application/json", can we find the way to get json string rather 
>>>>>>> rather
>>>>>>>    explicitly build  i.e  "org.apache.synapse.commons.json.Constants.
>>>>>>>    JSON_STRING"
>>>>>>>    - Supun, according to the stream definition ""children": 1,"
>>>>>>>    what it represents ?
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 15, 2016 at 9:15 PM, Supun Sethunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Dunith, Gihan,
>>>>>>>>
>>>>>>>> As per the offline chat had with Buddhima and Viraj, follow is a
>>>>>>>> sample payload to be published from ESB to DAS. Do we need any other
>>>>>>>> information for the plots/tables in dashboard?
>>>>>>>>
>>>>>>>> Here we added a new field "entryPoint" to indicate inside which
>>>>>>>> Proxy/API did the mediator get executed. So that it would be easy to 
>>>>>>>> drill
>>>>>>>> down from proxy view to mediator view. Please add if there is any other
>>>>>>>> similar field that would be needed for drill-downs, if we have missed 
>>>>>>>> any.
>>>>>>>>
>>>>>>>> {
>>>>>>>> "events": [{
>>>>>>>> "compotentType": "ProxyService",
>>>>>>>> "compotentId": "Test Proxy",
>>>>>>>> "startTime": 1455531027,
>>>>>>>> "endTime": 1455531041,
>>>>>>>> "duration": 3.321,
>>>>>>>> "beforePayload": null,
>>>>>>>> "afterPayload": null,
>>>>>>>> "contextPropertyMap":
>>>>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>>>>> "children": 1,
>>>>>>>> "entryPoint": "Test Proxy"
>>>>>>>> }, {
>>>>>>>> "compotentType": "Mediator",
>>>>>>>> "compotentId": "mediator_1",
>>>>>>>> "startTime": 1455531041,
>>>>>>>> "endTime": 1455531052,
>>>>>>>> "duration": 3.321,
>>>>>>>> "beforePayload": null,
>>>>>>>> "afterPayload": null,
>>>>>>>> "contextPropertyMap":
>>>>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}",
>>>>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml;
>>>>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}",
>>>>>>>> "children": 0,
>>>>>>>> "entryPoint": "Test Proxy"
>>>>>>>> }, {
>>>>>>>> "compotentType": "Mediator",
>>>>>>>> "compotentId": "mediator_2",
>>>>>>>> "startTime": 1455531052,
>>>>>>>> "endTime": 1455531074,
>>>>>>>> "duration": 3.321,
>>>>>>>> "beforePayload": null,
>>>>>>>> "afterPayload": null,
>>>>>>>> "contextPropertyMap": null,
>>>>>>>> "transportPropertyMap": null,
>>>>>>>> "children": 0,
>>>>>>>> "entryPoint": "Test Proxy"
>>>>>>>> }],
>>>>>>>>
>>>>>>>> "payloads": [{
>>>>>>>> "payload": "<?xml version=\"1.0\"
>>>>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>>>>> xmlns:sam=\"http://sample.esb.org
>>>>>>>> \"><sam:vehicleNumber>123456</sam:vehicleNumber></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>>>>> "events": [{
>>>>>>>> "eventIndex": 0,
>>>>>>>> "attributes": "beforePayload"
>>>>>>>> }, {
>>>>>>>> "eventIndex": 0,
>>>>>>>> "attributes": "afterPayload"
>>>>>>>> }, {
>>>>>>>> "eventIndex": 1,
>>>>>>>> "attributes": "beforePayload"
>>>>>>>> }]
>>>>>>>> }, {
>>>>>>>> "payload": "<?xml version=\"1.0\"
>>>>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\"
>>>>>>>> http://www.w3.org/2003/05/soap-envelope\";><soapenv:Body><sam:getCertificateID
>>>>>>>> xmlns:sam=\"http://sample.esb.org
>>>>>>>> \"><sam:vehicleNumber>123123</sam:vehicleNumber><sam:vehicleType>car</sam:vehicleType></sam:getCertificateID></soapenv:Body></soapenv:Envelope>",
>>>>>>>> "events": [{
>>>>>>>> "eventIndex": 1,
>>>>>>>> "attributes": "afterPayload"
>>>>>>>> }, {
>>>>>>>> "eventIndex": 2,
>>>>>>>> "attributes": "beforePayload"
>>>>>>>> }, {
>>>>>>>> "eventIndex": 2,
>>>>>>>> "attributes": "afterPayload"
>>>>>>>> }]
>>>>>>>> }]
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>> On Wed, Feb 10, 2016 at 11:57 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sinthuja,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> IMHO we could solve this issue as having conversions. Basically
>>>>>>>>>> we could use $payloads:payload1 to reference the elements as a 
>>>>>>>>>> convention.
>>>>>>>>>> If the element starts with '$' then it's the reference, not the 
>>>>>>>>>> actual
>>>>>>>>>> payload. In that case if there is a new element introduced, let's 
>>>>>>>>>> say foo
>>>>>>>>>> and you need to access the property property1, then it will have the
>>>>>>>>>> reference as $foo:property1.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, that's possible as well. But again, if the value for the
>>>>>>>>> property, say 'foo', has an actual value starting with some special
>>>>>>>>> character.. (in this case '$'), we may run in to ambiguity. (true, the
>>>>>>>>> chances are pretty less, but still possible).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Also this json event format is being sent as event payload in
>>>>>>>>>> wso2 event, and wso2 event is being published by the data publisher 
>>>>>>>>>> right?
>>>>>>>>>> Correct me if i'm wrong.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Supun,
>>>>>>>>>>
>>>>>>>>>> Also this json event format is being sent as event payload in
>>>>>>>>>> wso2 event, and wso2 event is being published by the data publisher 
>>>>>>>>>> right?
>>>>>>>>>> Correct me if i'm wrong.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sinthuja.
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sinthuja,
>>>>>>>>>>>>
>>>>>>>>>>>> Agree on the possibility of simplifying the json. We also
>>>>>>>>>>>> discussed on the same matter yesterday, but the complication came 
>>>>>>>>>>>> up was,
>>>>>>>>>>>> by an event in the "events" list, payload could be
>>>>>>>>>>>> either referenced, or defined in-line.(made as it is, so that it 
>>>>>>>>>>>> can be
>>>>>>>>>>>> generalized for other fields as well if needed, other than 
>>>>>>>>>>>> payloads.).
>>>>>>>>>>>>
>>>>>>>>>>> In such a case, if we had defined as 'payload': '*payload1**', *we
>>>>>>>>>>>> would not know if its the actual payload, or a reference to the 
>>>>>>>>>>>> payload in
>>>>>>>>>>>> the "payloads" section.
>>>>>>>>>>>>
>>>>>>>>>>>> With the suggested format, DAS will only go and map the payload
>>>>>>>>>>>> if its null.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> IMHO we could solve this issue as having conversions. Basically
>>>>>>>>>>> we could use $payloads:payload1 to reference the elements as a 
>>>>>>>>>>> convention.
>>>>>>>>>>> If the element starts with '$' then it's the reference, not the 
>>>>>>>>>>> actual
>>>>>>>>>>> payload. In that case if there is a new element introduced, let's 
>>>>>>>>>>> say foo
>>>>>>>>>>> and you need to access the property property1, then it will have the
>>>>>>>>>>> reference as $foo:property1.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sinthuja.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Supun
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we could simplify the json message bit more. Instead
>>>>>>>>>>>>> of 'null' for the payload attributes in the events section, you 
>>>>>>>>>>>>> could use
>>>>>>>>>>>>> the actual payload name directly if there is a payload for that 
>>>>>>>>>>>>> event. And
>>>>>>>>>>>>> in that case, we could eliminate the 'events' section from the 
>>>>>>>>>>>>> 'payloads'
>>>>>>>>>>>>> section. For the given example, it could be altered as below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>>>>> 'componentId': '111',
>>>>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>>> }, {
>>>>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>>>>> 'componentId': '222',
>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>> 'payload': '*payload1*',
>>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>>> }, {
>>>>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>>>>> 'componentId': '789',
>>>>>>>>>>>>> 'payload': '*payload2*',
>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>> 'output-payload':'*payload2*'
>>>>>>>>>>>>> }],
>>>>>>>>>>>>>
>>>>>>>>>>>>> 'payloads': {
>>>>>>>>>>>>> '*payload1*': 'xml-payload-1',
>>>>>>>>>>>>> '*payload2*': 'xml-payload-2',
>>>>>>>>>>>>> }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Sinthuja.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Budhdhima/Viraj,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As per the discussion we had yesterday, follow is the format
>>>>>>>>>>>>>> of the json contains aggregated event details, to be sent to 
>>>>>>>>>>>>>> DAS. (you may
>>>>>>>>>>>>>> change the attribute names of events).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To explain it further, "events" contains the details about
>>>>>>>>>>>>>> each event sent by each mediator. Payload may or may not be 
>>>>>>>>>>>>>> populated.
>>>>>>>>>>>>>> "Payloads" section contains unique payloads and the mapping to 
>>>>>>>>>>>>>> the events
>>>>>>>>>>>>>> their fields. (eg:  'xml-payload-2' maps to the 'payload' and
>>>>>>>>>>>>>> 'output-payload' fields of the 3rd event).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>>>> 'messageId': 'aaa',
>>>>>>>>>>>>>> 'componentId': '111',
>>>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>>>>
>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>>>> }, {
>>>>>>>>>>>>>> 'messageId': 'bbb',
>>>>>>>>>>>>>> 'componentId': '222',
>>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>>>> }, {
>>>>>>>>>>>>>> 'messageId': 'ccc',
>>>>>>>>>>>>>> 'componentId': '789',
>>>>>>>>>>>>>> 'payload': null,
>>>>>>>>>>>>>> 'componentName': 'Proxy:TestProxy',
>>>>>>>>>>>>>> 'output-payload':null
>>>>>>>>>>>>>> }],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'payloads': [{
>>>>>>>>>>>>>> 'payload': 'xml-payload-1',
>>>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>>>> 'eventIndex': 0,
>>>>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>>>>> }, {
>>>>>>>>>>>>>> 'eventIndex': 1,
>>>>>>>>>>>>>> 'attributes':['payload']
>>>>>>>>>>>>>> }]
>>>>>>>>>>>>>> }, {
>>>>>>>>>>>>>> 'payload': 'xml-payload-2',
>>>>>>>>>>>>>> 'events': [{
>>>>>>>>>>>>>> 'eventIndex': 2,
>>>>>>>>>>>>>> 'attributes':['payload','output-payload']
>>>>>>>>>>>>>> }]
>>>>>>>>>>>>>> }]
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let us know any further clarifications is needed, or
>>>>>>>>>>>>>> if there's anything to be modified/improved.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Kasun,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think for trancing use case we need to publish events one
>>>>>>>>>>>>>>>> by one from each mediator (we can't aggregate all such events 
>>>>>>>>>>>>>>>> as it also
>>>>>>>>>>>>>>>> contains the message payload)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think we can still do that with some extra effort.
>>>>>>>>>>>>>>> Most of the mediators in a sequence flow does not alter the
>>>>>>>>>>>>>>> message payload. We can store the payload only for the 
>>>>>>>>>>>>>>> mediators which
>>>>>>>>>>>>>>> alter the message payload. And for others, we can put a 
>>>>>>>>>>>>>>> reference to the
>>>>>>>>>>>>>>> previous entry. By doing that we can save the memory to a great 
>>>>>>>>>>>>>>> extent.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>>>>> From: Supun Sethunga <[email protected]>
>>>>>>>>>>>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM
>>>>>>>>>>>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing
>>>>>>>>>>>>>>>> Mechanism
>>>>>>>>>>>>>>>> To: Anjana Fernando <[email protected]>
>>>>>>>>>>>>>>>> Cc: "[email protected]" <
>>>>>>>>>>>>>>>> [email protected]>, Srinath Perera <
>>>>>>>>>>>>>>>> [email protected]>, Sanjiva Weerawarana <[email protected]>,
>>>>>>>>>>>>>>>> Kasun Indrasiri <[email protected]>, Isuru Udana <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ran some simple performance tests against the new
>>>>>>>>>>>>>>>> relational provider, in comparison with the existing one. 
>>>>>>>>>>>>>>>> Follow are the
>>>>>>>>>>>>>>>> results:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Records in Backend DB Table*: *1,054,057*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Conversion:*
>>>>>>>>>>>>>>>> Spark Table
>>>>>>>>>>>>>>>> id a b c
>>>>>>>>>>>>>>>> Backend DB Table 1 xxx yyy zzz
>>>>>>>>>>>>>>>> id data 1 ppp qqq rrr
>>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>> To --> 1 aaa bbb ccc
>>>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
>>>>>>>>>>>>>>>> 2 xxx yyy zzz
>>>>>>>>>>>>>>>> 2 aaa bbb ccc
>>>>>>>>>>>>>>>> 2 ppp qqq rrr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Avg Time for Query Execution:*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Querry
>>>>>>>>>>>>>>>> Execution time (~ sec)
>>>>>>>>>>>>>>>> Existing Analytics Relation Provider New (ESB) Analytics
>>>>>>>>>>>>>>>> Relation Provider* * New relational provider split a
>>>>>>>>>>>>>>>> single row to multiple rows. Hence the number of rows in the 
>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>> equivalent to 3 times (as each row is split to 3 rows) as the 
>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>> table.
>>>>>>>>>>>>>>>> SELECT COUNT(*) FROM <Table>; 13 16
>>>>>>>>>>>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16
>>>>>>>>>>>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16
>>>>>>>>>>>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a
>>>>>>>>>>>>>>>> ORDER BY id ASC; 18 26
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have started working on implementing a new "relation" / 
>>>>>>>>>>>>>>>>> "relation
>>>>>>>>>>>>>>>>> provider", to serve the above requirement. This basically is 
>>>>>>>>>>>>>>>>> a modified
>>>>>>>>>>>>>>>>> version of the existing "Carbon Analytics" relation provider.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here I have assumed that the encapsulated data for a
>>>>>>>>>>>>>>>>> single execution flow are stored in a single row, and the
>>>>>>>>>>>>>>>>> data about the mediators invoked during the flow are stored 
>>>>>>>>>>>>>>>>> in a known
>>>>>>>>>>>>>>>>> column of each row (say "data"), as an array (say a json 
>>>>>>>>>>>>>>>>> array). When each
>>>>>>>>>>>>>>>>> row is read in to spark, this relational provider create 
>>>>>>>>>>>>>>>>> separate rows for
>>>>>>>>>>>>>>>>> each of the element in the array stored in "data" column. I 
>>>>>>>>>>>>>>>>> have tested
>>>>>>>>>>>>>>>>> this with some mocked data, and works as expected.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Need to test with the real data/data-formats, and modify
>>>>>>>>>>>>>>>>> the mapping accordingly. Will update the thread with the 
>>>>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to
>>>>>>>>>>>>>>>>>> know that, for their tracing mechanism, they were instructed 
>>>>>>>>>>>>>>>>>> to publish one
>>>>>>>>>>>>>>>>>> event for each of the mediator invocations, where, earlier 
>>>>>>>>>>>>>>>>>> they had an
>>>>>>>>>>>>>>>>>> approach, they publish one event, which encapsulated data of 
>>>>>>>>>>>>>>>>>> a whole
>>>>>>>>>>>>>>>>>> execution flow. I would actually like to support the latter 
>>>>>>>>>>>>>>>>>> approach,
>>>>>>>>>>>>>>>>>> mainly due to performance / resource requirements. And also 
>>>>>>>>>>>>>>>>>> considering the
>>>>>>>>>>>>>>>>>> fact, this is a feature that could be enabled in production. 
>>>>>>>>>>>>>>>>>> So simply, if
>>>>>>>>>>>>>>>>>> we do one event per mediator, this does not scale that well. 
>>>>>>>>>>>>>>>>>> For example,
>>>>>>>>>>>>>>>>>> if the ESB is doing 1k TPS, for a sequence that has 20 
>>>>>>>>>>>>>>>>>> mediators, that is
>>>>>>>>>>>>>>>>>> 20k TPS for analytics traffic. Combine that with a possible 
>>>>>>>>>>>>>>>>>> ESB cluster
>>>>>>>>>>>>>>>>>> hitting a DAS cluster with a single backend database, this 
>>>>>>>>>>>>>>>>>> maybe too many
>>>>>>>>>>>>>>>>>> rows per second written to the database. Where the main 
>>>>>>>>>>>>>>>>>> problem here is,
>>>>>>>>>>>>>>>>>> one event is, a single row/record in the backend database in 
>>>>>>>>>>>>>>>>>> DAS, so it may
>>>>>>>>>>>>>>>>>> come to a state, where the frequency of row creations by 
>>>>>>>>>>>>>>>>>> events coming from
>>>>>>>>>>>>>>>>>> ESBs cannot be sustained.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If we create a single event from the 20 mediators, then
>>>>>>>>>>>>>>>>>> it is just 1k TPS for DAS event receivers and the database 
>>>>>>>>>>>>>>>>>> too, event
>>>>>>>>>>>>>>>>>> though the message size is bigger. It is not necessarily 
>>>>>>>>>>>>>>>>>> same performance,
>>>>>>>>>>>>>>>>>> if you publish lots of small events to publishing bigger 
>>>>>>>>>>>>>>>>>> events. Throughput
>>>>>>>>>>>>>>>>>> wise, comparatively bigger events will win (even though if 
>>>>>>>>>>>>>>>>>> we consider
>>>>>>>>>>>>>>>>>> that, small operations will be batched in transport level 
>>>>>>>>>>>>>>>>>> etc.. still one
>>>>>>>>>>>>>>>>>> event = one database row). So I would suggest, we try out a 
>>>>>>>>>>>>>>>>>> single sequence
>>>>>>>>>>>>>>>>>> flow = single event, approach, and from the Spark processing 
>>>>>>>>>>>>>>>>>> side, we
>>>>>>>>>>>>>>>>>> consider one of these big rows as multiple rows in Spark. I 
>>>>>>>>>>>>>>>>>> was first
>>>>>>>>>>>>>>>>>> thinking, if UDFs can help in splitting a single column to 
>>>>>>>>>>>>>>>>>> multiple rows,
>>>>>>>>>>>>>>>>>> and that is not possible, and also, a bit troublesome, 
>>>>>>>>>>>>>>>>>> considering we have
>>>>>>>>>>>>>>>>>> to delete the original data table after we concerted it 
>>>>>>>>>>>>>>>>>> using a script, and
>>>>>>>>>>>>>>>>>> not forgetting, we actually have to schedule and run a 
>>>>>>>>>>>>>>>>>> separate script to
>>>>>>>>>>>>>>>>>> do this post-processing. So a much cleaner way to do this 
>>>>>>>>>>>>>>>>>> would be, to
>>>>>>>>>>>>>>>>>> create a new "relation provider" in Spark (which is like a 
>>>>>>>>>>>>>>>>>> data adapter for
>>>>>>>>>>>>>>>>>> their DataFrames), and in our relation provider, when we are 
>>>>>>>>>>>>>>>>>> reading rows,
>>>>>>>>>>>>>>>>>> we convert a single row's column to multiple rows and return 
>>>>>>>>>>>>>>>>>> that for
>>>>>>>>>>>>>>>>>> processing. So Spark will not know, physically it was a 
>>>>>>>>>>>>>>>>>> single row from the
>>>>>>>>>>>>>>>>>> data layer, and it can summarize the data and all as usual 
>>>>>>>>>>>>>>>>>> and write to the
>>>>>>>>>>>>>>>>>> target summary tables. [1] is our existing implementation of 
>>>>>>>>>>>>>>>>>> Spark relation
>>>>>>>>>>>>>>>>>> provider, which directly maps to our DAS analytics tables, 
>>>>>>>>>>>>>>>>>> we can create
>>>>>>>>>>>>>>>>>> the new one extending / based on it. So I suggest we try out 
>>>>>>>>>>>>>>>>>> this approach
>>>>>>>>>>>>>>>>>> and see, if everyone is okay with it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to
>>>>>>>>>>>>>>>>>> the Google Groups "WSO2 Engineering Group" group.
>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>>>>> from it, send an email to
>>>>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>>>>> For more options, visit
>>>>>>>>>>>>>>>>>> https://groups.google.com/a/wso2.com/d/optout.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Kasun Indrasiri
>>>>>>>>>>>>>>>> Software Architect
>>>>>>>>>>>>>>>> WSO2, Inc.; http://wso2.com
>>>>>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cell: +94 77 556 5206
>>>>>>>>>>>>>>>> Blog : http://kasunpanorama.blogspot.com/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Isuru Udana*
>>>>>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>>>>>> WSO2 Inc.; http://wso2.com
>>>>>>>>>>>>>>> email: [email protected] cell: +94 77 3791887
>>>>>>>>>>>>>>> blog: http://mytecheye.blogspot.com/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>>>>>> Mobile: +94774273955
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>>>> Associate Technical Lead
>>>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>>>> Mobile: +94774273955
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Sinthuja Rajendran*
>>>>>>>>>> Associate Technical Lead
>>>>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>>>>
>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>>>>> Mobile: +94774273955
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Supun Sethunga*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> http://wso2.com/
>>>>>>>> lean | enterprise | middleware
>>>>>>>> Mobile : +94 716546324
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dushan Abeyruwan | Technical Lead
>>>>>>>
>>>>>>> PMC Member Apache Synpase
>>>>>>> WSO2 Inc. http://wso2.com/
>>>>>>> Blog:*http://www.dushantech.com/ <http://www.dushantech.com/>*
>>>>>>> Mobile:(001)408-791-9312
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Viraj Senevirathne
>>>>> Software Engineer; WSO2, Inc.
>>>>>
>>>>> Mobile : +94 71 958 0269
>>>>> Email : [email protected]
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Sanjiva Weerawarana, Ph.D.
>>> Founder, CEO & Chief Architect; WSO2, Inc.;  http://wso2.com/
>>> email: [email protected]; office: (+1 650 745 4499 | +94  11 214 5345)
>>> x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
>>> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
>>> Lean . Enterprise . Middleware
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>
>
>
> --
> Sanjiva Weerawarana, Ph.D.
> Founder, CEO & Chief Architect; WSO2, Inc.;  http://wso2.com/
> email: [email protected]; office: (+1 650 745 4499 | +94  11 214 5345)
> x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
> Lean . Enterprise . Middleware
>



-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to