Hi Supun,
On Wed, Feb 10, 2016 at 11:14 AM, Supun Sethunga <[email protected]> wrote: > Hi Sinthuja, > > Agree on the possibility of simplifying the json. We also discussed on the > same matter yesterday, but the complication came up was, by an event in the > "events" list, payload could be either referenced, or defined > in-line.(made as it is, so that it can be generalized for other fields as > well if needed, other than payloads.). > In such a case, if we had defined as 'payload': '*payload1**', *we would > not know if its the actual payload, or a reference to the payload in the > "payloads" section. > > With the suggested format, DAS will only go and map the payload if its > null. > > IMHO we could solve this issue as having conversions. Basically we could use $payloads:payload1 to reference the elements as a convention. If the element starts with '$' then it's the reference, not the actual payload. In that case if there is a new element introduced, let's say foo and you need to access the property property1, then it will have the reference as $foo:property1. Thanks, Sinthuja. > Regards, > Supun > > On Wed, Feb 10, 2016 at 10:52 AM, Sinthuja Ragendran <[email protected]> > wrote: > >> Hi Supun, >> >> I think we could simplify the json message bit more. Instead of 'null' >> for the payload attributes in the events section, you could use the actual >> payload name directly if there is a payload for that event. And in that >> case, we could eliminate the 'events' section from the 'payloads' section. >> For the given example, it could be altered as below. >> >> { >> 'events': [{ >> 'messageId': 'aaa', >> 'componentId': '111', >> 'payload': '*payload1*', >> 'componentName': 'Proxy:TestProxy', >> 'output-payload':null >> }, { >> 'messageId': 'bbb', >> 'componentId': '222', >> 'componentName': 'Proxy:TestProxy', >> 'payload': '*payload1*', >> 'output-payload':null >> }, { >> 'messageId': 'ccc', >> 'componentId': '789', >> 'payload': '*payload2*', >> 'componentName': 'Proxy:TestProxy', >> 'output-payload':'*payload2*' >> }], >> >> 'payloads': { >> '*payload1*': 'xml-payload-1', >> '*payload2*': 'xml-payload-2', >> } >> } >> >> Thanks, >> Sinthuja. >> >> On Wed, Feb 10, 2016 at 10:18 AM, Supun Sethunga <[email protected]> wrote: >> >>> Hi Budhdhima/Viraj, >>> >>> As per the discussion we had yesterday, follow is the format of the json >>> contains aggregated event details, to be sent to DAS. (you may change the >>> attribute names of events). >>> >>> To explain it further, "events" contains the details about each event >>> sent by each mediator. Payload may or may not be populated. "Payloads" >>> section contains unique payloads and the mapping to the events their >>> fields. (eg: 'xml-payload-2' maps to the 'payload' and 'output-payload' >>> fields of the 3rd event). >>> >>> { >>> 'events': [{ >>> 'messageId': 'aaa', >>> 'componentId': '111', >>> 'payload': null, >>> >> 'componentName': 'Proxy:TestProxy', >>> 'output-payload':null >>> }, { >>> 'messageId': 'bbb', >>> 'componentId': '222', >>> 'componentName': 'Proxy:TestProxy', >>> 'payload': null, >>> 'output-payload':null >>> }, { >>> 'messageId': 'ccc', >>> 'componentId': '789', >>> 'payload': null, >>> 'componentName': 'Proxy:TestProxy', >>> 'output-payload':null >>> }], >>> >>> 'payloads': [{ >>> 'payload': 'xml-payload-1', >>> 'events': [{ >>> 'eventIndex': 0, >>> 'attributes':['payload'] >>> }, { >>> 'eventIndex': 1, >>> 'attributes':['payload'] >>> }] >>> }, { >>> 'payload': 'xml-payload-2', >>> 'events': [{ >>> 'eventIndex': 2, >>> 'attributes':['payload','output-payload'] >>> }] >>> }] >>> } >>> >>> Please let us know any further clarifications is needed, or if there's >>> anything to be modified/improved. >>> >>> Thanks, >>> Supun >>> >>> On Tue, Feb 9, 2016 at 11:05 AM, Isuru Udana <[email protected]> wrote: >>> >>>> Hi Kasun, >>>> >>>> On Tue, Feb 9, 2016 at 10:10 AM, Kasun Indrasiri <[email protected]> >>>> wrote: >>>> >>>>> I think for trancing use case we need to publish events one by one >>>>> from each mediator (we can't aggregate all such events as it also contains >>>>> the message payload) >>>>> >>>> I think we can still do that with some extra effort. >>>> Most of the mediators in a sequence flow does not alter the message >>>> payload. We can store the payload only for the mediators which alter the >>>> message payload. And for others, we can put a reference to the previous >>>> entry. By doing that we can save the memory to a great extent. >>>> >>>> Thanks. >>>> >>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Supun Sethunga <[email protected]> >>>>> Date: Mon, Feb 8, 2016 at 2:54 PM >>>>> Subject: Re: ESB Analytics Mediation Event Publishing Mechanism >>>>> To: Anjana Fernando <[email protected]> >>>>> Cc: "[email protected]" <[email protected]>, >>>>> Srinath Perera <[email protected]>, Sanjiva Weerawarana < >>>>> [email protected]>, Kasun Indrasiri <[email protected]>, Isuru Udana < >>>>> [email protected]> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> Ran some simple performance tests against the new relational provider, >>>>> in comparison with the existing one. Follow are the results: >>>>> >>>>> *Records in Backend DB Table*: *1,054,057* >>>>> >>>>> *Conversion:* >>>>> Spark Table >>>>> id a b c >>>>> Backend DB Table 1 xxx yyy zzz >>>>> id data 1 ppp qqq rrr >>>>> 1 >>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>> -- >>>>> To --> 1 aaa bbb ccc >>>>> 2 >>>>> [{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}] >>>>> 2 xxx yyy zzz >>>>> 2 aaa bbb ccc >>>>> 2 ppp qqq rrr >>>>> >>>>> >>>>> >>>>> *Avg Time for Query Execution:* >>>>> >>>>> Querry >>>>> Execution time (~ sec) >>>>> Existing Analytics Relation Provider New (ESB) Analytics Relation >>>>> Provider* * New relational provider split a single row to multiple >>>>> rows. Hence the number of rows in the table equivalent to 3 times (as each >>>>> row is split to 3 rows) as the original table. >>>>> SELECT COUNT(*) FROM <Table>; 13 16 >>>>> SELECT * FROM <Table> ORDER BY id ASC; 13 16 >>>>> SELECT * FROM <Table> WHERE id=98435; 13 16 >>>>> SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id >>>>> ASC; 18 26 >>>>> >>>>> Regards, >>>>> Supun >>>>> >>>>> On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I have started working on implementing a new "relation" / "relation >>>>>> provider", to serve the above requirement. This basically is a modified >>>>>> version of the existing "Carbon Analytics" relation provider. >>>>>> >>>>>> Here I have assumed that the encapsulated data for a single execution >>>>>> flow are stored in a single row, and the data about the mediators >>>>>> invoked during the flow are stored in a known column of each row (say >>>>>> "data"), as an array (say a json array). When each row is read in to >>>>>> spark, >>>>>> this relational provider create separate rows for each of the element in >>>>>> the array stored in "data" column. I have tested this with some mocked >>>>>> data, and works as expected. >>>>>> >>>>>> Need to test with the real data/data-formats, and modify the mapping >>>>>> accordingly. Will update the thread with the details. >>>>>> >>>>>> Regards, >>>>>> Supun >>>>>> >>>>>> >>>>>> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In a meeting I'd with Kasun and the ESB team, I got to know that, >>>>>>> for their tracing mechanism, they were instructed to publish one event >>>>>>> for >>>>>>> each of the mediator invocations, where, earlier they had an approach, >>>>>>> they >>>>>>> publish one event, which encapsulated data of a whole execution flow. I >>>>>>> would actually like to support the latter approach, mainly due to >>>>>>> performance / resource requirements. And also considering the fact, >>>>>>> this is >>>>>>> a feature that could be enabled in production. So simply, if we do one >>>>>>> event per mediator, this does not scale that well. For example, if the >>>>>>> ESB >>>>>>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS >>>>>>> for >>>>>>> analytics traffic. Combine that with a possible ESB cluster hitting a >>>>>>> DAS >>>>>>> cluster with a single backend database, this maybe too many rows per >>>>>>> second >>>>>>> written to the database. Where the main problem here is, one event is, a >>>>>>> single row/record in the backend database in DAS, so it may come to a >>>>>>> state, where the frequency of row creations by events coming from ESBs >>>>>>> cannot be sustained. >>>>>>> >>>>>>> If we create a single event from the 20 mediators, then it is just >>>>>>> 1k TPS for DAS event receivers and the database too, event though the >>>>>>> message size is bigger. It is not necessarily same performance, if you >>>>>>> publish lots of small events to publishing bigger events. Throughput >>>>>>> wise, >>>>>>> comparatively bigger events will win (even though if we consider that, >>>>>>> small operations will be batched in transport level etc.. still one >>>>>>> event = >>>>>>> one database row). So I would suggest, we try out a single sequence >>>>>>> flow = >>>>>>> single event, approach, and from the Spark processing side, we consider >>>>>>> one >>>>>>> of these big rows as multiple rows in Spark. I was first thinking, if >>>>>>> UDFs >>>>>>> can help in splitting a single column to multiple rows, and that is not >>>>>>> possible, and also, a bit troublesome, considering we have to delete the >>>>>>> original data table after we concerted it using a script, and not >>>>>>> forgetting, we actually have to schedule and run a separate script to do >>>>>>> this post-processing. So a much cleaner way to do this would be, to >>>>>>> create >>>>>>> a new "relation provider" in Spark (which is like a data adapter for >>>>>>> their >>>>>>> DataFrames), and in our relation provider, when we are reading rows, we >>>>>>> convert a single row's column to multiple rows and return that for >>>>>>> processing. So Spark will not know, physically it was a single row from >>>>>>> the >>>>>>> data layer, and it can summarize the data and all as usual and write to >>>>>>> the >>>>>>> target summary tables. [1] is our existing implementation of Spark >>>>>>> relation >>>>>>> provider, which directly maps to our DAS analytics tables, we can create >>>>>>> the new one extending / based on it. So I suggest we try out this >>>>>>> approach >>>>>>> and see, if everyone is okay with it. >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java >>>>>>> >>>>>>> Cheers, >>>>>>> Anjana. >>>>>>> -- >>>>>>> *Anjana Fernando* >>>>>>> Senior Technical Lead >>>>>>> WSO2 Inc. | http://wso2.com >>>>>>> lean . enterprise . middleware >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "WSO2 Engineering Group" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit >>>>>>> https://groups.google.com/a/wso2.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 >>>>> >>>>> >>>>> >>>>> -- >>>>> Kasun Indrasiri >>>>> Software Architect >>>>> WSO2, Inc.; http://wso2.com >>>>> lean.enterprise.middleware >>>>> >>>>> cell: +94 77 556 5206 >>>>> Blog : http://kasunpanorama.blogspot.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> *Isuru Udana* >>>> Associate Technical Lead >>>> WSO2 Inc.; http://wso2.com >>>> email: [email protected] cell: +94 77 3791887 >>>> blog: http://mytecheye.blogspot.com/ >>>> >>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> *Sinthuja Rajendran* >> Associate Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > *Supun Sethunga* > Software Engineer > WSO2, Inc. > http://wso2.com/ > lean | enterprise | middleware > Mobile : +94 716546324 > -- *Sinthuja Rajendran* Associate Technical Lead WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
