Hi Sinthuja, Agree on the possibility of simplifying the json. We also discussed on the same matter yesterday, but the complication came up was, by an event in the "events" list, payload could be either referenced, or defined in-line.(made as it is, so that it can be generalized for other fields as well if needed, other than payloads.). In such a case, if we had defined as 'payload': ' *payload1**', *we would not know if its the actual payload, or a reference to the payload in the "payloads" section.
With the suggested format, DAS will only go and map the payload if its null. Regards, Supun On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]> wrote: > Hi Supun, > > I think we could simplify the json message bit more. Instead of 'null' for > the payload attributes in the events section, you could use the actual > payload name directly if there is a payload for that event. And in that > case, we could eliminate the 'events' section from the 'payloads' section. > For the given example, it could be altered as below. > > { > 'events': [{ > 'messageId': 'aaa', > 'componentId': '111', > 'payload': '*payload1*', > 'componentName': 'Proxy:TestProxy', > 'output-payload':null > }, { > 'messageId': 'bbb', > 'componentId': '222', > 'componentName': 'Proxy:TestProxy', > 'payload': '*payload1*', > 'output-payload':null > }, { > 'messageId': 'ccc', > 'componentId': '789', > 'payload': '*payload2*', > 'componentName': 'Proxy:TestProxy', > 'output-payload':'*payload2*' > }], > > 'payloads': { > '*payload1*': 'xml-payload-1', > '*payload2*': 'xml-payload-2', > } > } > > Thanks, > Sinthuja. > > On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]> wrote: > >> Hi Budhdhima/Viraj, >> >> As per the discussion we had yesterday, follow is the format of the json >> contains aggregated event details, to be sent to DAS. (you may change the >> attribute names of events). >> >> To explain it further, "events" contains the details about each event >> sent by each mediator. Payload may or may not be populated. "Payloads" >> section contains unique payloads and the mapping to the events their >> fields. (eg: 'xml-payload-2' maps to the 'payload' and 'output-payload' >> fields of the 3rd event). >> >> { >> 'events': [{ >> 'messageId': 'aaa', >> 'componentId': '111', >> 'payload': null, >> > 'componentName': 'Proxy:TestProxy', >> 'output-payload':null >> }, { >> 'messageId': 'bbb', >> 'componentId': '222', >> 'componentName': 'Proxy:TestProxy', >> 'payload': null, >> 'output-payload':null >> }, { >> 'messageId': 'ccc', >> 'componentId': '789', >> 'payload': null, >> 'componentName': 'Proxy:TestProxy', >> 'output-payload':null >> }], >> >> 'payloads': [{ >> 'payload': 'xml-payload-1', >> 'events': [{ >> 'eventIndex': 0, >> 'attributes':['payload'] >> }, { >> 'eventIndex': 1, >> 'attributes':['payload'] >> }] >> }, { >> 'payload': 'xml-payload-2', >> 'events': [{ >> 'eventIndex': 2, >> 'attributes':['payload','output-payload'] >> }] >> }] >> } >> >> Please let us know any further clarifications is needed, or if there's >> anything to be modified/improved. >> >> Thanks, >> Supun >> >> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote: >> >>> Hi Kasun, >>> >>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]> wrote: >>> >>>> I think for trancing use case we need to publish events one by one from >>>> each mediator (we can't aggregate all such events as it also contains the >>>> message payload) >>>> >>> I think we can still do that with some extra effort. >>> Most of the mediators in a sequence flow does not alter the message >>> payload. We can store the payload only for the mediators which alter the >>> message payload. And for others, we can put a reference to the previous >>> entry. By doing that we can save the memory to a great extent. >>> >>> Thanks. >>> >>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Supun Sethunga <[email protected]> >>>> Date: Mon, Feb 8, 2016 at 2:54 PM >>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism >>>> To: Anjana Fernando <[email protected]> >>>> Cc: "[email protected]" <[email protected]>, Srinath >>>> Perera <[email protected]>, Sanjiva Weerawarana <[email protected]>, >>>> Kasun Indrasiri <[email protected]>, Isuru Udana <[email protected]> >>>> >>>> >>>> Hi all, >>>> >>>> Ran some simple performance tests against the new relational provider, >>>> in comparison with the existing one. Follow are the results: >>>> >>>> *Records in Backend DB Table*: *1,054,057* >>>> >>>> *Conversion:* >>>> Spark Table >>>> id a b c >>>> Backend DB Table 1 xxx yyy zzz >>>> id data 1 ppp qqq rrr >>>> 1 >>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>> -- >>>> To --> 1 aaa bbb ccc >>>> 2 >>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>> 2 xxx yyy zzz >>>> 2 aaa bbb ccc >>>> 2 ppp qqq rrr >>>> >>>> >>>> >>>> *Avg Time for Query Execution:* >>>> >>>> Querry >>>> Execution time (~ sec) >>>> Existing Analytics Relation Provider New (ESB) Analytics Relation >>>> Provider* * New relational provider split a single row to multiple >>>> rows. Hence the number of rows in the table equivalent to 3 times (as each >>>> row is split to 3 rows) as the original table. >>>> SELECT COUNT(*) FROM <Table>; 13 16 >>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16 >>>> SELECT * FROM <Table> WHERE id=98435; 13 16 >>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id >>>> ASC; 18 26 >>>> >>>> Regards, >>>> Supun >>>> >>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have started working on implementing a new "relation" / "relation >>>>> provider", to serve the above requirement. This basically is a modified >>>>> version of the existing "Carbon Analytics" relation provider. >>>>> >>>>> Here I have assumed that the encapsulated data for a single execution >>>>> flow are stored in a single row, and the data about the mediators >>>>> invoked during the flow are stored in a known column of each row (say >>>>> "data"), as an array (say a json array). When each row is read in to >>>>> spark, >>>>> this relational provider create separate rows for each of the element in >>>>> the array stored in "data" column. I have tested this with some mocked >>>>> data, and works as expected. >>>>> >>>>> Need to test with the real data/data-formats, and modify the mapping >>>>> accordingly. Will update the thread with the details. >>>>> >>>>> Regards, >>>>> Supun >>>>> >>>>> >>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that, for >>>>>> their tracing mechanism, they were instructed to publish one event for >>>>>> each >>>>>> of the mediator invocations, where, earlier they had an approach, they >>>>>> publish one event, which encapsulated data of a whole execution flow. I >>>>>> would actually like to support the latter approach, mainly due to >>>>>> performance / resource requirements. And also considering the fact, this >>>>>> is >>>>>> a feature that could be enabled in production. So simply, if we do one >>>>>> event per mediator, this does not scale that well. For example, if the >>>>>> ESB >>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS >>>>>> for >>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a DAS >>>>>> cluster with a single backend database, this maybe too many rows per >>>>>> second >>>>>> written to the database. Where the main problem here is, one event is, a >>>>>> single row/record in the backend database in DAS, so it may come to a >>>>>> state, where the frequency of row creations by events coming from ESBs >>>>>> cannot be sustained. >>>>>> >>>>>> If we create a single event from the 20 mediators, then it is just 1k >>>>>> TPS for DAS event receivers and the database too, event though the >>>>>> message >>>>>> size is bigger. It is not necessarily same performance, if you publish >>>>>> lots >>>>>> of small events to publishing bigger events. Throughput wise, >>>>>> comparatively >>>>>> bigger events will win (even though if we consider that, small operations >>>>>> will be batched in transport level etc.. still one event = one database >>>>>> row). So I would suggest, we try out a single sequence flow = single >>>>>> event, >>>>>> approach, and from the Spark processing side, we consider one of these >>>>>> big >>>>>> rows as multiple rows in Spark. I was first thinking, if UDFs can help in >>>>>> splitting a single column to multiple rows, and that is not possible, and >>>>>> also, a bit troublesome, considering we have to delete the original data >>>>>> table after we concerted it using a script, and not forgetting, we >>>>>> actually >>>>>> have to schedule and run a separate script to do this post-processing. >>>>>> So a >>>>>> much cleaner way to do this would be, to create a new "relation provider" >>>>>> in Spark (which is like a data adapter for their DataFrames), and in our >>>>>> relation provider, when we are reading rows, we convert a single row's >>>>>> column to multiple rows and return that for processing. So Spark will not >>>>>> know, physically it was a single row from the data layer, and it can >>>>>> summarize the data and all as usual and write to the target summary >>>>>> tables. >>>>>> [1] is our existing implementation of Spark relation provider, which >>>>>> directly maps to our DAS analytics tables, we can create the new one >>>>>> extending / based on it. So I suggest we try out this approach and see, >>>>>> if >>>>>> everyone is okay with it. >>>>>> >>>>>> [1] >>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java >>>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> -- >>>>>> *Anjana Fernando* >>>>>> Senior Technical Lead >>>>>> WSO2 Inc. | http://wso2.com >>>>>> lean . enterprise . middleware >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "WSO2 Engineering Group" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/a/wso2.com/d/optout >>>>>> . >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 >>>>> >>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>>> >>>> >>>> -- >>>> Kasun Indrasiri >>>> Software Architect >>>> WSO2, Inc.; http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> cell: +94 77 556 5206 >>>> Blog : http://kasunpanorama.blogspot.com/ >>>> >>> >>> >>> >>> -- >>> *Isuru Udana* >>> Associate Technical Lead >>> WSO2 Inc.; http://wso2.com >>> email: [email protected] cell: +94 77 3791887 >>> blog: http://mytecheye.blogspot.com/ >>> >> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
