Hi Sanjiva, Yes we are indeed using a stream definition and publishing the events using Thrift.
But in doing so, there were two approaches we considered: 1. ESB publishing a single event per mediator in a message flow. 2. ESB publishing a single event per message flow (rather than for each mediator in the message flow). With the perf tests we ran, approach #2 proved to be better than #1 in terms of performance. The format we have discussed above in the thread, is the structure of the payload of a wso2event, which contains aggregated information of mediators. (i.e: how to put all the information of all mediators in a message flow, in to a single event). This way we can also get rid of duplicating information as well. (for eg; mediators like 'log' and 'property' does not change the xml payload for a single message flow.) Regards, Supun On Wed, Feb 24, 2016 at 11:25 AM, Sanjiva Weerawarana <[email protected]> wrote: > Why are we inventing a new event format for this? Why not use a stream > definition and publish using Thrift? > > Sorry if I'm missing something here. > > On Fri, Feb 19, 2016 at 11:44 AM, Supun Sethunga <[email protected]> wrote: > >> HI, >> >> Ran some more performance tests to contrast between publishing Aggregated >> events Vs Multiple single events, and follow are the results: >> >> *Results:* >> >> No of concurrent publishers (to DAS): 10 >> Back-end DB: MySQL >> >> Single Events Aggregated Events* Single Events Aggregated Events* >> No of events: 160,000 10,000 1,600,000 100,000 >> Event payload size: 1.9 KB 21.6 KB 1.9 KB 21.6 KB >> Time Consumed** (mm:ss): 1:55 0:30 19:46 4:31 >> >> *An aggregated event contains payloads of 16 single events. >> **Time consumed = time to complete all DB transactions. >> >> Please note that these times were monitored while DB trace logs were on. >> So that too have some effect on the performance in overall. >> >> Regards, >> Supun >> >> On Wed, Feb 17, 2016 at 5:37 PM, Viraj Senevirathne <[email protected]> >> wrote: >> >>> Hi All, >>> >>> We got a simple sample payload for a actual message flow (attached). >>> >>> This have about 16 mediators. The payload file size is ~27.4kB. With >>> different payload size and large number of mediators in the flow , single >>> payload size can get even bigger. So if ESB is serving 1000 request per >>> second, ESB will transfer payloads to DAS with data rate ~27Mb/s. With >>> large payload sizes and large number of mediators in the flow this data >>> rate can be go up very high. >>> >>> As strings have high repeatably compression works well wtih them. After >>> compressing above payload its size ~2kB. (93% reduction from original size). >>> >>> Large Json File with 1.3MB was reduced to 14.3kB after compression. >>> >>> Therefore will it be possible to send compressed json string to DAS >>> instead of uncompressed one. Then DAS can decompress the file and use the >>> actual json payload. >>> >>> I think this will reduce the data rate drastically and ease data >>> communication. >>> >>> Will it be possible to define new type like "commpressedJSON" to achive >>> this? WDYT about this idea? >>> >>> Thank You, >>> >>> On Wed, Feb 17, 2016 at 9:38 AM, Supun Sethunga <[email protected]> wrote: >>> >>>> Hi Dushan, >>>> >>>> Supun, according to the stream definition ""children": 1," what it >>>>> represents ? >>>> >>>> >>>> Here, each event basically represent a mediator/proxy. So "children" >>>> represents the child mediator(s) in the message flow. This info is used to >>>> draw the message flow diagram. >>>> >>>> For eg, if we consider the first event in the array, "children":1 means >>>> event at index 1 is the first mediator after Test Proxy. and so on. >>>> Sorry, the values I have put for the "children" in second and third >>>> events are misleading. They should be "children":2 and "children":null, >>>> respectively. So, null means its the end of the message flow. >>>> >>>> Regards, >>>> Supun >>>> >>>> On Wed, Feb 17, 2016 at 2:34 AM, Dushan Abeyruwan <[email protected]> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> - If we publish events from each mediator then, we can certainly >>>>> group each event from unique parentID can't we? (I mean this would >>>>> allow us >>>>> to prepare a aggregated view per incoming message and visualize >>>>> different >>>>> stages of each message representation and other meta information, >>>>> think of >>>>> complex mediation) >>>>> - Can't we record payload as according to Content-Type, therefore, >>>>> shall we get rid of SOAP way of representing? >>>>> - If we have non-content aware mediation flow with >>>>> "application/json", can we find the way to get json string rather >>>>> rather >>>>> explicitly build i.e "org.apache.synapse.commons.json.Constants. >>>>> JSON_STRING" >>>>> - Supun, according to the stream definition ""children": 1," what >>>>> it represents ? >>>>> >>>>> >>>>> On Mon, Feb 15, 2016 at 9:15 PM, Supun Sethunga <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Dunith, Gihan, >>>>>> >>>>>> As per the offline chat had with Buddhima and Viraj, follow is a >>>>>> sample payload to be published from ESB to DAS. Do we need any other >>>>>> information for the plots/tables in dashboard? >>>>>> >>>>>> Here we added a new field "entryPoint" to indicate inside which >>>>>> Proxy/API did the mediator get executed. So that it would be easy to >>>>>> drill >>>>>> down from proxy view to mediator view. Please add if there is any other >>>>>> similar field that would be needed for drill-downs, if we have missed >>>>>> any. >>>>>> >>>>>> { >>>>>> "events": [{ >>>>>> "compotentType": "ProxyService", >>>>>> "compotentId": "Test Proxy", >>>>>> "startTime": 1455531027, >>>>>> "endTime": 1455531041, >>>>>> "duration": 3.321, >>>>>> "beforePayload": null, >>>>>> "afterPayload": null, >>>>>> "contextPropertyMap": >>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}", >>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml; >>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}", >>>>>> "children": 1, >>>>>> "entryPoint": "Test Proxy" >>>>>> }, { >>>>>> "compotentType": "Mediator", >>>>>> "compotentId": "mediator_1", >>>>>> "startTime": 1455531041, >>>>>> "endTime": 1455531052, >>>>>> "duration": 3.321, >>>>>> "beforePayload": null, >>>>>> "afterPayload": null, >>>>>> "contextPropertyMap": >>>>>> "{\"MESSAGE_FLOW_ID\":\"urn_uuid_e4251abb-8ff5-433b-8dcb-24f251c3e30d\"}", >>>>>> "transportPropertyMap": "{\"Content-Type\":\"application\/soap+xml; >>>>>> charset=UTF-8; action=\"urn:renewLicense\"\",\"Host\":\"localhost\"}", >>>>>> "children": 0, >>>>>> "entryPoint": "Test Proxy" >>>>>> }, { >>>>>> "compotentType": "Mediator", >>>>>> "compotentId": "mediator_2", >>>>>> "startTime": 1455531052, >>>>>> "endTime": 1455531074, >>>>>> "duration": 3.321, >>>>>> "beforePayload": null, >>>>>> "afterPayload": null, >>>>>> "contextPropertyMap": null, >>>>>> "transportPropertyMap": null, >>>>>> "children": 0, >>>>>> "entryPoint": "Test Proxy" >>>>>> }], >>>>>> >>>>>> "payloads": [{ >>>>>> "payload": "<?xml version=\"1.0\" >>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\" >>>>>> http://www.w3.org/2003/05/soap-envelope\"><soapenv:Body><sam:getCertificateID >>>>>> xmlns:sam=\"http://sample.esb.org >>>>>> \"><sam:vehicleNumber>123456</sam:vehicleNumber></sam:getCertificateID></soapenv:Body></soapenv:Envelope>", >>>>>> "events": [{ >>>>>> "eventIndex": 0, >>>>>> "attributes": "beforePayload" >>>>>> }, { >>>>>> "eventIndex": 0, >>>>>> "attributes": "afterPayload" >>>>>> }, { >>>>>> "eventIndex": 1, >>>>>> "attributes": "beforePayload" >>>>>> }] >>>>>> }, { >>>>>> "payload": "<?xml version=\"1.0\" >>>>>> encoding=\"utf-8\"?><soapenv:Envelope xmlns:soapenv=\" >>>>>> http://www.w3.org/2003/05/soap-envelope\"><soapenv:Body><sam:getCertificateID >>>>>> xmlns:sam=\"http://sample.esb.org >>>>>> \"><sam:vehicleNumber>123123</sam:vehicleNumber><sam:vehicleType>car</sam:vehicleType></sam:getCertificateID></soapenv:Body></soapenv:Envelope>", >>>>>> "events": [{ >>>>>> "eventIndex": 1, >>>>>> "attributes": "afterPayload" >>>>>> }, { >>>>>> "eventIndex": 2, >>>>>> "attributes": "beforePayload" >>>>>> }, { >>>>>> "eventIndex": 2, >>>>>> "attributes": "afterPayload" >>>>>> }] >>>>>> }] >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Supun >>>>>> >>>>>> On Wed, Feb 10, 2016 at 11:57 AM, Supun Sethunga <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Sinthuja, >>>>>>> >>>>>>> >>>>>>>> IMHO we could solve this issue as having conversions. Basically we >>>>>>>> could use $payloads:payload1 to reference the elements as a >>>>>>>> convention. If >>>>>>>> the element starts with '$' then it's the reference, not the actual >>>>>>>> payload. In that case if there is a new element introduced, let's say >>>>>>>> foo >>>>>>>> and you need to access the property property1, then it will have the >>>>>>>> reference as $foo:property1. >>>>>>> >>>>>>> >>>>>>> Yes, that's possible as well. But again, if the value for the >>>>>>> property, say 'foo', has an actual value starting with some special >>>>>>> character.. (in this case '$'), we may run in to ambiguity. (true, the >>>>>>> chances are pretty less, but still possible). >>>>>>> >>>>>>> >>>>>>> Also this json event format is being sent as event payload in wso2 >>>>>>>> event, and wso2 event is being published by the data publisher right? >>>>>>>> Correct me if i'm wrong. >>>>>>> >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>> Thanks, >>>>>>> Supun >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Supun, >>>>>>>> >>>>>>>> Also this json event format is being sent as event payload in wso2 >>>>>>>> event, and wso2 event is being published by the data publisher right? >>>>>>>> Correct me if i'm wrong. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sinthuja. >>>>>>>> >>>>>>>> On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Supun, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Sinthuja, >>>>>>>>>> >>>>>>>>>> Agree on the possibility of simplifying the json. We also >>>>>>>>>> discussed on the same matter yesterday, but the complication came up >>>>>>>>>> was, >>>>>>>>>> by an event in the "events" list, payload could be >>>>>>>>>> either referenced, or defined in-line.(made as it is, so that it can >>>>>>>>>> be >>>>>>>>>> generalized for other fields as well if needed, other than >>>>>>>>>> payloads.). >>>>>>>>>> >>>>>>>>> In such a case, if we had defined as 'payload': '*payload1**', *we >>>>>>>>>> would not know if its the actual payload, or a reference to the >>>>>>>>>> payload in >>>>>>>>>> the "payloads" section. >>>>>>>>>> >>>>>>>>>> With the suggested format, DAS will only go and map the payload >>>>>>>>>> if its null. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> IMHO we could solve this issue as having conversions. Basically we >>>>>>>>> could use $payloads:payload1 to reference the elements as a >>>>>>>>> convention. If >>>>>>>>> the element starts with '$' then it's the reference, not the actual >>>>>>>>> payload. In that case if there is a new element introduced, let's say >>>>>>>>> foo >>>>>>>>> and you need to access the property property1, then it will have the >>>>>>>>> reference as $foo:property1. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sinthuja. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Supun >>>>>>>>>> >>>>>>>>>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Supun, >>>>>>>>>>> >>>>>>>>>>> I think we could simplify the json message bit more. Instead of >>>>>>>>>>> 'null' for the payload attributes in the events section, you could >>>>>>>>>>> use the >>>>>>>>>>> actual payload name directly if there is a payload for that event. >>>>>>>>>>> And in >>>>>>>>>>> that case, we could eliminate the 'events' section from the >>>>>>>>>>> 'payloads' >>>>>>>>>>> section. For the given example, it could be altered as below. >>>>>>>>>>> >>>>>>>>>>> { >>>>>>>>>>> 'events': [{ >>>>>>>>>>> 'messageId': 'aaa', >>>>>>>>>>> 'componentId': '111', >>>>>>>>>>> 'payload': '*payload1*', >>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>> 'output-payload':null >>>>>>>>>>> }, { >>>>>>>>>>> 'messageId': 'bbb', >>>>>>>>>>> 'componentId': '222', >>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>> 'payload': '*payload1*', >>>>>>>>>>> 'output-payload':null >>>>>>>>>>> }, { >>>>>>>>>>> 'messageId': 'ccc', >>>>>>>>>>> 'componentId': '789', >>>>>>>>>>> 'payload': '*payload2*', >>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>> 'output-payload':'*payload2*' >>>>>>>>>>> }], >>>>>>>>>>> >>>>>>>>>>> 'payloads': { >>>>>>>>>>> '*payload1*': 'xml-payload-1', >>>>>>>>>>> '*payload2*': 'xml-payload-2', >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sinthuja. >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Budhdhima/Viraj, >>>>>>>>>>>> >>>>>>>>>>>> As per the discussion we had yesterday, follow is the format of >>>>>>>>>>>> the json contains aggregated event details, to be sent to DAS. >>>>>>>>>>>> (you may >>>>>>>>>>>> change the attribute names of events). >>>>>>>>>>>> >>>>>>>>>>>> To explain it further, "events" contains the details about each >>>>>>>>>>>> event sent by each mediator. Payload may or may not be populated. >>>>>>>>>>>> "Payloads" section contains unique payloads and the mapping to the >>>>>>>>>>>> events >>>>>>>>>>>> their fields. (eg: 'xml-payload-2' maps to the 'payload' and >>>>>>>>>>>> 'output-payload' fields of the 3rd event). >>>>>>>>>>>> >>>>>>>>>>>> { >>>>>>>>>>>> 'events': [{ >>>>>>>>>>>> 'messageId': 'aaa', >>>>>>>>>>>> 'componentId': '111', >>>>>>>>>>>> 'payload': null, >>>>>>>>>>>> >>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>>> 'output-payload':null >>>>>>>>>>>> }, { >>>>>>>>>>>> 'messageId': 'bbb', >>>>>>>>>>>> 'componentId': '222', >>>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>>> 'payload': null, >>>>>>>>>>>> 'output-payload':null >>>>>>>>>>>> }, { >>>>>>>>>>>> 'messageId': 'ccc', >>>>>>>>>>>> 'componentId': '789', >>>>>>>>>>>> 'payload': null, >>>>>>>>>>>> 'componentName': 'Proxy:TestProxy', >>>>>>>>>>>> 'output-payload':null >>>>>>>>>>>> }], >>>>>>>>>>>> >>>>>>>>>>>> 'payloads': [{ >>>>>>>>>>>> 'payload': 'xml-payload-1', >>>>>>>>>>>> 'events': [{ >>>>>>>>>>>> 'eventIndex': 0, >>>>>>>>>>>> 'attributes':['payload'] >>>>>>>>>>>> }, { >>>>>>>>>>>> 'eventIndex': 1, >>>>>>>>>>>> 'attributes':['payload'] >>>>>>>>>>>> }] >>>>>>>>>>>> }, { >>>>>>>>>>>> 'payload': 'xml-payload-2', >>>>>>>>>>>> 'events': [{ >>>>>>>>>>>> 'eventIndex': 2, >>>>>>>>>>>> 'attributes':['payload','output-payload'] >>>>>>>>>>>> }] >>>>>>>>>>>> }] >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Please let us know any further clarifications is needed, or if >>>>>>>>>>>> there's anything to be modified/improved. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Supun >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Kasun, >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think for trancing use case we need to publish events one >>>>>>>>>>>>>> by one from each mediator (we can't aggregate all such events as >>>>>>>>>>>>>> it also >>>>>>>>>>>>>> contains the message payload) >>>>>>>>>>>>>> >>>>>>>>>>>>> I think we can still do that with some extra effort. >>>>>>>>>>>>> Most of the mediators in a sequence flow does not alter the >>>>>>>>>>>>> message payload. We can store the payload only for the mediators >>>>>>>>>>>>> which >>>>>>>>>>>>> alter the message payload. And for others, we can put a reference >>>>>>>>>>>>> to the >>>>>>>>>>>>> previous entry. By doing that we can save the memory to a great >>>>>>>>>>>>> extent. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>>>>>>> From: Supun Sethunga <[email protected]> >>>>>>>>>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM >>>>>>>>>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing >>>>>>>>>>>>>> Mechanism >>>>>>>>>>>>>> To: Anjana Fernando <[email protected]> >>>>>>>>>>>>>> Cc: "[email protected]" <[email protected]>, >>>>>>>>>>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana < >>>>>>>>>>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru >>>>>>>>>>>>>> Udana <[email protected]> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ran some simple performance tests against the new relational >>>>>>>>>>>>>> provider, in comparison with the existing one. Follow are the >>>>>>>>>>>>>> results: >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Records in Backend DB Table*: *1,054,057* >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Conversion:* >>>>>>>>>>>>>> Spark Table >>>>>>>>>>>>>> id a b c >>>>>>>>>>>>>> Backend DB Table 1 xxx yyy zzz >>>>>>>>>>>>>> id data 1 ppp qqq rrr >>>>>>>>>>>>>> 1 >>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To --> 1 aaa bbb ccc >>>>>>>>>>>>>> 2 >>>>>>>>>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>>>>>>>>>> 2 xxx yyy zzz >>>>>>>>>>>>>> 2 aaa bbb ccc >>>>>>>>>>>>>> 2 ppp qqq rrr >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Avg Time for Query Execution:* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Querry >>>>>>>>>>>>>> Execution time (~ sec) >>>>>>>>>>>>>> Existing Analytics Relation Provider New (ESB) Analytics >>>>>>>>>>>>>> Relation Provider* * New relational provider split a single >>>>>>>>>>>>>> row to multiple rows. Hence the number of rows in the table >>>>>>>>>>>>>> equivalent to 3 >>>>>>>>>>>>>> times (as each row is split to 3 rows) as the original table. >>>>>>>>>>>>>> SELECT COUNT(*) FROM <Table>; 13 16 >>>>>>>>>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16 >>>>>>>>>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16 >>>>>>>>>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a >>>>>>>>>>>>>> ORDER BY id ASC; 18 26 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Supun >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have started working on implementing a new "relation" / >>>>>>>>>>>>>>> "relation >>>>>>>>>>>>>>> provider", to serve the above requirement. This basically is a >>>>>>>>>>>>>>> modified >>>>>>>>>>>>>>> version of the existing "Carbon Analytics" relation provider. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here I have assumed that the encapsulated data for a single >>>>>>>>>>>>>>> execution >>>>>>>>>>>>>>> flow are stored in a single row, and the data about the >>>>>>>>>>>>>>> mediators invoked during the flow are stored in a known column >>>>>>>>>>>>>>> of each row >>>>>>>>>>>>>>> (say "data"), as an array (say a json array). When each row is >>>>>>>>>>>>>>> read in to >>>>>>>>>>>>>>> spark, this relational provider create separate rows for each >>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>> element in the array stored in "data" column. I have tested >>>>>>>>>>>>>>> this with some >>>>>>>>>>>>>>> mocked data, and works as expected. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Need to test with the real data/data-formats, and modify the >>>>>>>>>>>>>>> mapping accordingly. Will update the thread with the details. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Supun >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know >>>>>>>>>>>>>>>> that, for their tracing mechanism, they were instructed to >>>>>>>>>>>>>>>> publish one >>>>>>>>>>>>>>>> event for each of the mediator invocations, where, earlier >>>>>>>>>>>>>>>> they had an >>>>>>>>>>>>>>>> approach, they publish one event, which encapsulated data of a >>>>>>>>>>>>>>>> whole >>>>>>>>>>>>>>>> execution flow. I would actually like to support the latter >>>>>>>>>>>>>>>> approach, >>>>>>>>>>>>>>>> mainly due to performance / resource requirements. And also >>>>>>>>>>>>>>>> considering the >>>>>>>>>>>>>>>> fact, this is a feature that could be enabled in production. >>>>>>>>>>>>>>>> So simply, if >>>>>>>>>>>>>>>> we do one event per mediator, this does not scale that well. >>>>>>>>>>>>>>>> For example, >>>>>>>>>>>>>>>> if the ESB is doing 1k TPS, for a sequence that has 20 >>>>>>>>>>>>>>>> mediators, that is >>>>>>>>>>>>>>>> 20k TPS for analytics traffic. Combine that with a possible >>>>>>>>>>>>>>>> ESB cluster >>>>>>>>>>>>>>>> hitting a DAS cluster with a single backend database, this >>>>>>>>>>>>>>>> maybe too many >>>>>>>>>>>>>>>> rows per second written to the database. Where the main >>>>>>>>>>>>>>>> problem here is, >>>>>>>>>>>>>>>> one event is, a single row/record in the backend database in >>>>>>>>>>>>>>>> DAS, so it may >>>>>>>>>>>>>>>> come to a state, where the frequency of row creations by >>>>>>>>>>>>>>>> events coming from >>>>>>>>>>>>>>>> ESBs cannot be sustained. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we create a single event from the 20 mediators, then it >>>>>>>>>>>>>>>> is just 1k TPS for DAS event receivers and the database too, >>>>>>>>>>>>>>>> event though >>>>>>>>>>>>>>>> the message size is bigger. It is not necessarily same >>>>>>>>>>>>>>>> performance, if you >>>>>>>>>>>>>>>> publish lots of small events to publishing bigger events. >>>>>>>>>>>>>>>> Throughput wise, >>>>>>>>>>>>>>>> comparatively bigger events will win (even though if we >>>>>>>>>>>>>>>> consider that, >>>>>>>>>>>>>>>> small operations will be batched in transport level etc.. >>>>>>>>>>>>>>>> still one event = >>>>>>>>>>>>>>>> one database row). So I would suggest, we try out a single >>>>>>>>>>>>>>>> sequence flow = >>>>>>>>>>>>>>>> single event, approach, and from the Spark processing side, we >>>>>>>>>>>>>>>> consider one >>>>>>>>>>>>>>>> of these big rows as multiple rows in Spark. I was first >>>>>>>>>>>>>>>> thinking, if UDFs >>>>>>>>>>>>>>>> can help in splitting a single column to multiple rows, and >>>>>>>>>>>>>>>> that is not >>>>>>>>>>>>>>>> possible, and also, a bit troublesome, considering we have to >>>>>>>>>>>>>>>> delete the >>>>>>>>>>>>>>>> original data table after we concerted it using a script, and >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> forgetting, we actually have to schedule and run a separate >>>>>>>>>>>>>>>> script to do >>>>>>>>>>>>>>>> this post-processing. So a much cleaner way to do this would >>>>>>>>>>>>>>>> be, to create >>>>>>>>>>>>>>>> a new "relation provider" in Spark (which is like a data >>>>>>>>>>>>>>>> adapter for their >>>>>>>>>>>>>>>> DataFrames), and in our relation provider, when we are reading >>>>>>>>>>>>>>>> rows, we >>>>>>>>>>>>>>>> convert a single row's column to multiple rows and return that >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> processing. So Spark will not know, physically it was a single >>>>>>>>>>>>>>>> row from the >>>>>>>>>>>>>>>> data layer, and it can summarize the data and all as usual and >>>>>>>>>>>>>>>> write to the >>>>>>>>>>>>>>>> target summary tables. [1] is our existing implementation of >>>>>>>>>>>>>>>> Spark relation >>>>>>>>>>>>>>>> provider, which directly maps to our DAS analytics tables, we >>>>>>>>>>>>>>>> can create >>>>>>>>>>>>>>>> the new one extending / based on it. So I suggest we try out >>>>>>>>>>>>>>>> this approach >>>>>>>>>>>>>>>> and see, if everyone is okay with it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Anjana. >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> *Anjana Fernando* >>>>>>>>>>>>>>>> Senior Technical Lead >>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>>> Google Groups "WSO2 Engineering Group" group. >>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>> from it, send an email to >>>>>>>>>>>>>>>> [email protected]. >>>>>>>>>>>>>>>> For more options, visit >>>>>>>>>>>>>>>> https://groups.google.com/a/wso2.com/d/optout. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>>> WSO2, Inc. >>>>>>>>>>>>>>> http://wso2.com/ >>>>>>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>> WSO2, Inc. >>>>>>>>>>>>>> http://wso2.com/ >>>>>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Kasun Indrasiri >>>>>>>>>>>>>> Software Architect >>>>>>>>>>>>>> WSO2, Inc.; http://wso2.com >>>>>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>>>>> >>>>>>>>>>>>>> cell: +94 77 556 5206 >>>>>>>>>>>>>> Blog : http://kasunpanorama.blogspot.com/ >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> *Isuru Udana* >>>>>>>>>>>>> Associate Technical Lead >>>>>>>>>>>>> WSO2 Inc.; http://wso2.com >>>>>>>>>>>>> email: [email protected] cell: +94 77 3791887 >>>>>>>>>>>>> blog: http://mytecheye.blogspot.com/ >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>>> Software Engineer >>>>>>>>>>>> WSO2, Inc. >>>>>>>>>>>> http://wso2.com/ >>>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Sinthuja Rajendran* >>>>>>>>>>> Associate Technical Lead >>>>>>>>>>> WSO2, Inc.:http://wso2.com >>>>>>>>>>> >>>>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>>>>>> Mobile: +94774273955 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Supun Sethunga* >>>>>>>>>> Software Engineer >>>>>>>>>> WSO2, Inc. >>>>>>>>>> http://wso2.com/ >>>>>>>>>> lean | enterprise | middleware >>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Sinthuja Rajendran* >>>>>>>>> Associate Technical Lead >>>>>>>>> WSO2, Inc.:http://wso2.com >>>>>>>>> >>>>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>>>> Mobile: +94774273955 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Sinthuja Rajendran* >>>>>>>> Associate Technical Lead >>>>>>>> WSO2, Inc.:http://wso2.com >>>>>>>> >>>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>>> Mobile: +94774273955 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dushan Abeyruwan | Technical Lead >>>>> >>>>> PMC Member Apache Synpase >>>>> WSO2 Inc. http://wso2.com/ >>>>> Blog:*http://www.dushantech.com/ <http://www.dushantech.com/>* >>>>> Mobile:(001)408-791-9312 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> Viraj Senevirathne >>> Software Engineer; WSO2, Inc. >>> >>> Mobile : +94 71 958 0269 >>> Email : [email protected] >>> >> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Sanjiva Weerawarana, Ph.D. > Founder, CEO & Chief Architect; WSO2, Inc.; http://wso2.com/ > email: [email protected]; office: (+1 650 745 4499 | +94 11 214 5345) > x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311 > blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva > Lean . Enterprise . Middleware > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
