Hi Sinthuja,
> IMHO we could solve this issue as having conversions. Basically we could > use $payloads:payload1 to reference the elements as a convention. If the > element starts with '$' then it's the reference, not the actual payload. In > that case if there is a new element introduced, let's say foo and you need > to access the property property1, then it will have the reference as > $foo:property1. Yes, that's possible as well. But again, if the value for the property, say 'foo', has an actual value starting with some special character.. (in this case '$'), we may run in to ambiguity. (true, the chances are pretty less, but still possible). Also this json event format is being sent as event payload in wso2 event, > and wso2 event is being published by the data publisher right? Correct me > if i'm wrong. Yes. Thanks, Supun On Wed, Feb 10, 2016 at 11:35 AM, Sinthuja Ragendran <[email protected]> wrote: > Hi Supun, > > Also this json event format is being sent as event payload in wso2 event, > and wso2 event is being published by the data publisher right? Correct me > if i'm wrong. > > Thanks, > Sinthuja. > > On Wed, Feb 10, 2016 at 11:26 AM, Sinthuja Ragendran <[email protected]> > wrote: > >> Hi Supun, >> >> >> On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> wrote: >> >>> Hi Sinthuja, >>> >>> Agree on the possibility of simplifying the json. We also discussed on >>> the same matter yesterday, but the complication came up was, by an event in >>> the "events" list, payload could be either referenced, or defined >>> in-line.(made as it is, so that it can be generalized for other fields as >>> well if needed, other than payloads.). >>> >> In such a case, if we had defined as 'payload': '*payload1**', *we would >>> not know if its the actual payload, or a reference to the payload in the >>> "payloads" section. >>> >>> With the suggested format, DAS will only go and map the payload if its >>> null. >>> >>> >> IMHO we could solve this issue as having conversions. Basically we could >> use $payloads:payload1 to reference the elements as a convention. If the >> element starts with '$' then it's the reference, not the actual payload. In >> that case if there is a new element introduced, let's say foo and you need >> to access the property property1, then it will have the reference as >> $foo:property1. >> >> Thanks, >> Sinthuja. >> >> >> >>> Regards, >>> Supun >>> >>> On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]> >>> wrote: >>> >>>> Hi Supun, >>>> >>>> I think we could simplify the json message bit more. Instead of 'null' >>>> for the payload attributes in the events section, you could use the actual >>>> payload name directly if there is a payload for that event. And in that >>>> case, we could eliminate the 'events' section from the 'payloads' section. >>>> For the given example, it could be altered as below. >>>> >>>> { >>>> 'events': [{ >>>> 'messageId': 'aaa', >>>> 'componentId': '111', >>>> 'payload': '*payload1*', >>>> 'componentName': 'Proxy:TestProxy', >>>> 'output-payload':null >>>> }, { >>>> 'messageId': 'bbb', >>>> 'componentId': '222', >>>> 'componentName': 'Proxy:TestProxy', >>>> 'payload': '*payload1*', >>>> 'output-payload':null >>>> }, { >>>> 'messageId': 'ccc', >>>> 'componentId': '789', >>>> 'payload': '*payload2*', >>>> 'componentName': 'Proxy:TestProxy', >>>> 'output-payload':'*payload2*' >>>> }], >>>> >>>> 'payloads': { >>>> '*payload1*': 'xml-payload-1', >>>> '*payload2*': 'xml-payload-2', >>>> } >>>> } >>>> >>>> Thanks, >>>> Sinthuja. >>>> >>>> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]> >>>> wrote: >>>> >>>>> Hi Budhdhima/Viraj, >>>>> >>>>> As per the discussion we had yesterday, follow is the format of the >>>>> json contains aggregated event details, to be sent to DAS. (you may change >>>>> the attribute names of events). >>>>> >>>>> To explain it further, "events" contains the details about each event >>>>> sent by each mediator. Payload may or may not be populated. "Payloads" >>>>> section contains unique payloads and the mapping to the events their >>>>> fields. (eg: 'xml-payload-2' maps to the 'payload' and 'output-payload' >>>>> fields of the 3rd event). >>>>> >>>>> { >>>>> 'events': [{ >>>>> 'messageId': 'aaa', >>>>> 'componentId': '111', >>>>> 'payload': null, >>>>> >>>> 'componentName': 'Proxy:TestProxy', >>>>> 'output-payload':null >>>>> }, { >>>>> 'messageId': 'bbb', >>>>> 'componentId': '222', >>>>> 'componentName': 'Proxy:TestProxy', >>>>> 'payload': null, >>>>> 'output-payload':null >>>>> }, { >>>>> 'messageId': 'ccc', >>>>> 'componentId': '789', >>>>> 'payload': null, >>>>> 'componentName': 'Proxy:TestProxy', >>>>> 'output-payload':null >>>>> }], >>>>> >>>>> 'payloads': [{ >>>>> 'payload': 'xml-payload-1', >>>>> 'events': [{ >>>>> 'eventIndex': 0, >>>>> 'attributes':['payload'] >>>>> }, { >>>>> 'eventIndex': 1, >>>>> 'attributes':['payload'] >>>>> }] >>>>> }, { >>>>> 'payload': 'xml-payload-2', >>>>> 'events': [{ >>>>> 'eventIndex': 2, >>>>> 'attributes':['payload','output-payload'] >>>>> }] >>>>> }] >>>>> } >>>>> >>>>> Please let us know any further clarifications is needed, or if there's >>>>> anything to be modified/improved. >>>>> >>>>> Thanks, >>>>> Supun >>>>> >>>>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote: >>>>> >>>>>> Hi Kasun, >>>>>> >>>>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I think for trancing use case we need to publish events one by one >>>>>>> from each mediator (we can't aggregate all such events as it also >>>>>>> contains >>>>>>> the message payload) >>>>>>> >>>>>> I think we can still do that with some extra effort. >>>>>> Most of the mediators in a sequence flow does not alter the message >>>>>> payload. We can store the payload only for the mediators which alter the >>>>>> message payload. And for others, we can put a reference to the previous >>>>>> entry. By doing that we can save the memory to a great extent. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: Supun Sethunga <[email protected]> >>>>>>> Date: Mon, Feb 8, 2016 at 2:54 PM >>>>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism >>>>>>> To: Anjana Fernando <[email protected]> >>>>>>> Cc: "[email protected]" <[email protected]>, >>>>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana < >>>>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru Udana < >>>>>>> [email protected]> >>>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Ran some simple performance tests against the new relational >>>>>>> provider, in comparison with the existing one. Follow are the results: >>>>>>> >>>>>>> *Records in Backend DB Table*: *1,054,057* >>>>>>> >>>>>>> *Conversion:* >>>>>>> Spark Table >>>>>>> id a b c >>>>>>> Backend DB Table 1 xxx yyy zzz >>>>>>> id data 1 ppp qqq rrr >>>>>>> 1 >>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>>> -- >>>>>>> To --> 1 aaa bbb ccc >>>>>>> 2 >>>>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>>>> 2 xxx yyy zzz >>>>>>> 2 aaa bbb ccc >>>>>>> 2 ppp qqq rrr >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Avg Time for Query Execution:* >>>>>>> >>>>>>> Querry >>>>>>> Execution time (~ sec) >>>>>>> Existing Analytics Relation Provider New (ESB) Analytics Relation >>>>>>> Provider* * New relational provider split a single row to multiple >>>>>>> rows. Hence the number of rows in the table equivalent to 3 times (as >>>>>>> each >>>>>>> row is split to 3 rows) as the original table. >>>>>>> SELECT COUNT(*) FROM <Table>; 13 16 >>>>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16 >>>>>>> SELECT * FROM <Table> WHERE id=98435; 13 16 >>>>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id >>>>>>> ASC; 18 26 >>>>>>> >>>>>>> Regards, >>>>>>> Supun >>>>>>> >>>>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have started working on implementing a new "relation" / "relation >>>>>>>> provider", to serve the above requirement. This basically is a modified >>>>>>>> version of the existing "Carbon Analytics" relation provider. >>>>>>>> >>>>>>>> Here I have assumed that the encapsulated data for a single execution >>>>>>>> flow are stored in a single row, and the data about the mediators >>>>>>>> invoked during the flow are stored in a known column of each row (say >>>>>>>> "data"), as an array (say a json array). When each row is read in to >>>>>>>> spark, >>>>>>>> this relational provider create separate rows for each of the element >>>>>>>> in >>>>>>>> the array stored in "data" column. I have tested this with some mocked >>>>>>>> data, and works as expected. >>>>>>>> >>>>>>>> Need to test with the real data/data-formats, and modify the >>>>>>>> mapping accordingly. Will update the thread with the details. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Supun >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that, >>>>>>>>> for their tracing mechanism, they were instructed to publish one >>>>>>>>> event for >>>>>>>>> each of the mediator invocations, where, earlier they had an >>>>>>>>> approach, they >>>>>>>>> publish one event, which encapsulated data of a whole execution flow. >>>>>>>>> I >>>>>>>>> would actually like to support the latter approach, mainly due to >>>>>>>>> performance / resource requirements. And also considering the fact, >>>>>>>>> this is >>>>>>>>> a feature that could be enabled in production. So simply, if we do one >>>>>>>>> event per mediator, this does not scale that well. For example, if >>>>>>>>> the ESB >>>>>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k >>>>>>>>> TPS for >>>>>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a >>>>>>>>> DAS >>>>>>>>> cluster with a single backend database, this maybe too many rows per >>>>>>>>> second >>>>>>>>> written to the database. Where the main problem here is, one event >>>>>>>>> is, a >>>>>>>>> single row/record in the backend database in DAS, so it may come to a >>>>>>>>> state, where the frequency of row creations by events coming from ESBs >>>>>>>>> cannot be sustained. >>>>>>>>> >>>>>>>>> If we create a single event from the 20 mediators, then it is just >>>>>>>>> 1k TPS for DAS event receivers and the database too, event though the >>>>>>>>> message size is bigger. It is not necessarily same performance, if you >>>>>>>>> publish lots of small events to publishing bigger events. Throughput >>>>>>>>> wise, >>>>>>>>> comparatively bigger events will win (even though if we consider that, >>>>>>>>> small operations will be batched in transport level etc.. still one >>>>>>>>> event = >>>>>>>>> one database row). So I would suggest, we try out a single sequence >>>>>>>>> flow = >>>>>>>>> single event, approach, and from the Spark processing side, we >>>>>>>>> consider one >>>>>>>>> of these big rows as multiple rows in Spark. I was first thinking, if >>>>>>>>> UDFs >>>>>>>>> can help in splitting a single column to multiple rows, and that is >>>>>>>>> not >>>>>>>>> possible, and also, a bit troublesome, considering we have to delete >>>>>>>>> the >>>>>>>>> original data table after we concerted it using a script, and not >>>>>>>>> forgetting, we actually have to schedule and run a separate script to >>>>>>>>> do >>>>>>>>> this post-processing. So a much cleaner way to do this would be, to >>>>>>>>> create >>>>>>>>> a new "relation provider" in Spark (which is like a data adapter for >>>>>>>>> their >>>>>>>>> DataFrames), and in our relation provider, when we are reading rows, >>>>>>>>> we >>>>>>>>> convert a single row's column to multiple rows and return that for >>>>>>>>> processing. So Spark will not know, physically it was a single row >>>>>>>>> from the >>>>>>>>> data layer, and it can summarize the data and all as usual and write >>>>>>>>> to the >>>>>>>>> target summary tables. [1] is our existing implementation of Spark >>>>>>>>> relation >>>>>>>>> provider, which directly maps to our DAS analytics tables, we can >>>>>>>>> create >>>>>>>>> the new one extending / based on it. So I suggest we try out this >>>>>>>>> approach >>>>>>>>> and see, if everyone is okay with it. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Anjana. >>>>>>>>> -- >>>>>>>>> *Anjana Fernando* >>>>>>>>> Senior Technical Lead >>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>> lean . enterprise . middleware >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "WSO2 Engineering Group" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> For more options, visit >>>>>>>>> https://groups.google.com/a/wso2.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Supun Sethunga* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> http://wso2.com/ >>>>>>>> lean | enterprise | middleware >>>>>>>> Mobile : +94 716546324 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Kasun Indrasiri >>>>>>> Software Architect >>>>>>> WSO2, Inc.; http://wso2.com >>>>>>> lean.enterprise.middleware >>>>>>> >>>>>>> cell: +94 77 556 5206 >>>>>>> Blog : http://kasunpanorama.blogspot.com/ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Isuru Udana* >>>>>> Associate Technical Lead >>>>>> WSO2 Inc.; http://wso2.com >>>>>> email: [email protected] cell: +94 77 3791887 >>>>>> blog: http://mytecheye.blogspot.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Sinthuja Rajendran* >>>> Associate Technical Lead >>>> WSO2, Inc.:http://wso2.com >>>> >>>> Blog: http://sinthu-rajan.blogspot.com/ >>>> Mobile: +94774273955 >>>> >>>> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> >> >> >> >> -- >> *Sinthuja Rajendran* >> Associate Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
