Hi Supun, Also this json event format is being sent as event payload in wso2 event, and wso2 event is being published by the data publisher right? Correct me if i'm wrong.
Thanks, Sinthuja. On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <[email protected]> wrote: > Hi Supun, > > > On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> wrote: > >> Hi Sinthuja, >> >> Agree on the possibility of simplifying the json. We also discussed on >> the same matter yesterday, but the complication came up was, by an event in >> the "events" list, payload could be either referenced, or defined >> in-line.(made as it is, so that it can be generalized for other fields as >> well if needed, other than payloads.). >> > In such a case, if we had defined as 'payload': '*payload1**', *we would >> not know if its the actual payload, or a reference to the payload in the >> "payloads" section. >> >> With the suggested format, DAS will only go and map the payload if its >> null. >> >> > IMHO we could solve this issue as having conversions. Basically we could > use $payloads:payload1 to reference the elements as a convention. If the > element starts with '$' then it's the reference, not the actual payload. In > that case if there is a new element introduced, let's say foo and you need > to access the property property1, then it will have the reference as > $foo:property1. > > Thanks, > Sinthuja. > > > >> Regards, >> Supun >> >> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi Supun, >>> >>> I think we could simplify the json message bit more. Instead of 'null' >>> for the payload attributes in the events section, you could use the actual >>> payload name directly if there is a payload for that event. And in that >>> case, we could eliminate the 'events' section from the 'payloads' section. >>> For the given example, it could be altered as below. >>> >>> { >>> 'events': [{ >>> 'messageId': 'aaa', >>> 'componentId': '111', >>> 'payload': '*payload1*', >>> 'componentName': 'Proxy:TestProxy', >>> 'output-payload':null >>> }, { >>> 'messageId': 'bbb', >>> 'componentId': '222', >>> 'componentName': 'Proxy:TestProxy', >>> 'payload': '*payload1*', >>> 'output-payload':null >>> }, { >>> 'messageId': 'ccc', >>> 'componentId': '789', >>> 'payload': '*payload2*', >>> 'componentName': 'Proxy:TestProxy', >>> 'output-payload':'*payload2*' >>> }], >>> >>> 'payloads': { >>> '*payload1*': 'xml-payload-1', >>> '*payload2*': 'xml-payload-2', >>> } >>> } >>> >>> Thanks, >>> Sinthuja. >>> >>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]> >>> wrote: >>> >>>> Hi Budhdhima/Viraj, >>>> >>>> As per the discussion we had yesterday, follow is the format of the >>>> json contains aggregated event details, to be sent to DAS. (you may change >>>> the attribute names of events). >>>> >>>> To explain it further, "events" contains the details about each event >>>> sent by each mediator. Payload may or may not be populated. "Payloads" >>>> section contains unique payloads and the mapping to the events their >>>> fields. (eg: 'xml-payload-2' maps to the 'payload' and 'output-payload' >>>> fields of the 3rd event). >>>> >>>> { >>>> 'events': [{ >>>> 'messageId': 'aaa', >>>> 'componentId': '111', >>>> 'payload': null, >>>> >>> 'componentName': 'Proxy:TestProxy', >>>> 'output-payload':null >>>> }, { >>>> 'messageId': 'bbb', >>>> 'componentId': '222', >>>> 'componentName': 'Proxy:TestProxy', >>>> 'payload': null, >>>> 'output-payload':null >>>> }, { >>>> 'messageId': 'ccc', >>>> 'componentId': '789', >>>> 'payload': null, >>>> 'componentName': 'Proxy:TestProxy', >>>> 'output-payload':null >>>> }], >>>> >>>> 'payloads': [{ >>>> 'payload': 'xml-payload-1', >>>> 'events': [{ >>>> 'eventIndex': 0, >>>> 'attributes':['payload'] >>>> }, { >>>> 'eventIndex': 1, >>>> 'attributes':['payload'] >>>> }] >>>> }, { >>>> 'payload': 'xml-payload-2', >>>> 'events': [{ >>>> 'eventIndex': 2, >>>> 'attributes':['payload','output-payload'] >>>> }] >>>> }] >>>> } >>>> >>>> Please let us know any further clarifications is needed, or if there's >>>> anything to be modified/improved. >>>> >>>> Thanks, >>>> Supun >>>> >>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote: >>>> >>>>> Hi Kasun, >>>>> >>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]> >>>>> wrote: >>>>> >>>>>> I think for trancing use case we need to publish events one by one >>>>>> from each mediator (we can't aggregate all such events as it also >>>>>> contains >>>>>> the message payload) >>>>>> >>>>> I think we can still do that with some extra effort. >>>>> Most of the mediators in a sequence flow does not alter the message >>>>> payload. We can store the payload only for the mediators which alter the >>>>> message payload. And for others, we can put a reference to the previous >>>>> entry. By doing that we can save the memory to a great extent. >>>>> >>>>> Thanks. >>>>> >>>>> >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Supun Sethunga <[email protected]> >>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM >>>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism >>>>>> To: Anjana Fernando <[email protected]> >>>>>> Cc: "[email protected]" <[email protected]>, >>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana < >>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru Udana < >>>>>> [email protected]> >>>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Ran some simple performance tests against the new relational >>>>>> provider, in comparison with the existing one. Follow are the results: >>>>>> >>>>>> *Records in Backend DB Table*: *1,054,057* >>>>>> >>>>>> *Conversion:* >>>>>> Spark Table >>>>>> id a b c >>>>>> Backend DB Table 1 xxx yyy zzz >>>>>> id data 1 ppp qqq rrr >>>>>> 1 >>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>> -- >>>>>> To --> 1 aaa bbb ccc >>>>>> 2 >>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>> 2 xxx yyy zzz >>>>>> 2 aaa bbb ccc >>>>>> 2 ppp qqq rrr >>>>>> >>>>>> >>>>>> >>>>>> *Avg Time for Query Execution:* >>>>>> >>>>>> Querry >>>>>> Execution time (~ sec) >>>>>> Existing Analytics Relation Provider New (ESB) Analytics Relation >>>>>> Provider* * New relational provider split a single row to multiple >>>>>> rows. Hence the number of rows in the table equivalent to 3 times (as >>>>>> each >>>>>> row is split to 3 rows) as the original table. >>>>>> SELECT COUNT(*) FROM <Table>; 13 16 >>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16 >>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16 >>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id >>>>>> ASC; 18 26 >>>>>> >>>>>> Regards, >>>>>> Supun >>>>>> >>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have started working on implementing a new "relation" / "relation >>>>>>> provider", to serve the above requirement. This basically is a modified >>>>>>> version of the existing "Carbon Analytics" relation provider. >>>>>>> >>>>>>> Here I have assumed that the encapsulated data for a single execution >>>>>>> flow are stored in a single row, and the data about the mediators >>>>>>> invoked during the flow are stored in a known column of each row (say >>>>>>> "data"), as an array (say a json array). When each row is read in to >>>>>>> spark, >>>>>>> this relational provider create separate rows for each of the element in >>>>>>> the array stored in "data" column. I have tested this with some mocked >>>>>>> data, and works as expected. >>>>>>> >>>>>>> Need to test with the real data/data-formats, and modify the mapping >>>>>>> accordingly. Will update the thread with the details. >>>>>>> >>>>>>> Regards, >>>>>>> Supun >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that, >>>>>>>> for their tracing mechanism, they were instructed to publish one event >>>>>>>> for >>>>>>>> each of the mediator invocations, where, earlier they had an approach, >>>>>>>> they >>>>>>>> publish one event, which encapsulated data of a whole execution flow. I >>>>>>>> would actually like to support the latter approach, mainly due to >>>>>>>> performance / resource requirements. And also considering the fact, >>>>>>>> this is >>>>>>>> a feature that could be enabled in production. So simply, if we do one >>>>>>>> event per mediator, this does not scale that well. For example, if the >>>>>>>> ESB >>>>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS >>>>>>>> for >>>>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a >>>>>>>> DAS >>>>>>>> cluster with a single backend database, this maybe too many rows per >>>>>>>> second >>>>>>>> written to the database. Where the main problem here is, one event is, >>>>>>>> a >>>>>>>> single row/record in the backend database in DAS, so it may come to a >>>>>>>> state, where the frequency of row creations by events coming from ESBs >>>>>>>> cannot be sustained. >>>>>>>> >>>>>>>> If we create a single event from the 20 mediators, then it is just >>>>>>>> 1k TPS for DAS event receivers and the database too, event though the >>>>>>>> message size is bigger. It is not necessarily same performance, if you >>>>>>>> publish lots of small events to publishing bigger events. Throughput >>>>>>>> wise, >>>>>>>> comparatively bigger events will win (even though if we consider that, >>>>>>>> small operations will be batched in transport level etc.. still one >>>>>>>> event = >>>>>>>> one database row). So I would suggest, we try out a single sequence >>>>>>>> flow = >>>>>>>> single event, approach, and from the Spark processing side, we >>>>>>>> consider one >>>>>>>> of these big rows as multiple rows in Spark. I was first thinking, if >>>>>>>> UDFs >>>>>>>> can help in splitting a single column to multiple rows, and that is not >>>>>>>> possible, and also, a bit troublesome, considering we have to delete >>>>>>>> the >>>>>>>> original data table after we concerted it using a script, and not >>>>>>>> forgetting, we actually have to schedule and run a separate script to >>>>>>>> do >>>>>>>> this post-processing. So a much cleaner way to do this would be, to >>>>>>>> create >>>>>>>> a new "relation provider" in Spark (which is like a data adapter for >>>>>>>> their >>>>>>>> DataFrames), and in our relation provider, when we are reading rows, we >>>>>>>> convert a single row's column to multiple rows and return that for >>>>>>>> processing. So Spark will not know, physically it was a single row >>>>>>>> from the >>>>>>>> data layer, and it can summarize the data and all as usual and write >>>>>>>> to the >>>>>>>> target summary tables. [1] is our existing implementation of Spark >>>>>>>> relation >>>>>>>> provider, which directly maps to our DAS analytics tables, we can >>>>>>>> create >>>>>>>> the new one extending / based on it. So I suggest we try out this >>>>>>>> approach >>>>>>>> and see, if everyone is okay with it. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Anjana. >>>>>>>> -- >>>>>>>> *Anjana Fernando* >>>>>>>> Senior Technical Lead >>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>> lean . enterprise . middleware >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "WSO2 Engineering Group" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> For more options, visit >>>>>>>> https://groups.google.com/a/wso2.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Kasun Indrasiri >>>>>> Software Architect >>>>>> WSO2, Inc.; http://wso2.com >>>>>> lean.enterprise.middleware >>>>>> >>>>>> cell: +94 77 556 5206 >>>>>> Blog : http://kasunpanorama.blogspot.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Isuru Udana* >>>>> Associate Technical Lead >>>>> WSO2 Inc.; http://wso2.com >>>>> email: [email protected] cell: +94 77 3791887 >>>>> blog: http://mytecheye.blogspot.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> > > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- *Sinthuja Rajendran* Associate Technical Lead WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
