Hi Sanjiva, That might work, but we need to try it out with a workload. ( CEP joins are bit slower than other operations, so have to see).
DAS, when writing treat data as name-value pairs. It only tries to understand it when it is processing data. So storage model should be OK. My belief is network load is not the bottleneck ( again have to verify). --Srinath On Wed, Mar 30, 2016 at 8:19 AM, Sanjiva Weerawarana <[email protected]> wrote: > Srinath what if we come up with a way in the event receiver side to > aggregate a set of events to one based on some correlation field? We can do > this in an embedded Siddhi in the receiver ... basically keep like a 5 sec > window to aggregate all events that carry the same correlation field into > one, combine and then send forward for storage + processing. Sometimes we > will miss but most of the time it won't. The storage model needs to be > sufficiently flexible but HBase should be fine (?). The real time feed must > not have this feature of course. > > With multiple servers firing events related to one interaction its not > possible to do this from the source ends without distributed caching and > that's not a good model. > > It does not address the network load issue of course. > > Sanjiva. > > On Tue, Mar 29, 2016 at 2:49 PM, Srinath Perera <[email protected]> wrote: > >> Nuwan, regarding Q1, we can setup such a way that we publisher auto >> publisher the events after timeout or after N events are accumelated. >> >> Nuwan, Chathura ( regarding Q2), >> >> We already do event batching. Above numbers are after event batching. >> There are two bottlenecks. One is sending events over the network and the >> other is writing them to DB. Batching helps a lot in moving it over the >> network, but does not help much when writing to DB. >> >> Regarding null, one option is to group event generated by a single >> message together, which will avoid most nulls. I think our main concern is >> single message triggering multiple events. We also need to write queries to >> copy the values from single big events to different streams and use those >> streams to write queries. >> >> e.g. We can copy values from Big stream to HTTPStream, using which we >> will write HTTP analytics queries. >> >> --Srinath >> >> >> >> >> On Tue, Mar 29, 2016 at 1:29 PM, Chathura Ekanayake <[email protected]> >> wrote: >> >>> As we can reduce the number of event transfers with event batching, I >>> think the advantage of using a single event stream is to reduce number of >>> disk writes at DAS side. But as Nuwan mentioned, dealing with null fields >>> can be a problem in writing analytics scripts. >>> >>> Regards, >>> Chathura >>> >>> On Tue, Mar 29, 2016 at 10:40 AM, Nuwan Dias <[email protected]> wrote: >>> >>>> Having to publish a single event after collecting all possible data >>>> records from the server would be good in terms of scalability aspects of >>>> the DAS/Analytics platform. However I see that it introduces new challenges >>>> for which we would need solutions. >>>> >>>> 1. How to guarantee a event is always published to DAS? In the case of >>>> API Manager, a request has multiple exit points. Such as auth failures, >>>> throttling out, back-end failures, message processing failures, etc. So we >>>> need a way to guarantee that an event is always sent out whatever the >>>> state. >>>> >>>> 2. With this model, I'm assuming we only have 1 stream definition. Is >>>> this correct? If so would this not make the analytics part complicated? For >>>> example, say I have a spark query to summarize the throttled out events >>>> from an App, since I can only see a single stream the query would have to >>>> deal with null fields and have to deal with the whole bulk of data even if >>>> in reality it might only have to deal with a few. The same complexity would >>>> arise for the CEP based throttling engine and the new alerts we're building >>>> as well. >>>> >>>> Thanks, >>>> NuwanD. >>>> >>>> On Sat, Mar 26, 2016 at 1:22 AM, Inosh Goonewardena <[email protected]> >>>> wrote: >>>> >>>>> +1. With combined event approach we can avoid sending duplicate >>>>> information to some level as well. For example, in API analytics scenario >>>>> both request and response streams have consumerKey, context, api_version, >>>>> api, resourcePath, etc properties which the values will be same for both >>>>> request event and corresponding response event. With single event approach >>>>> we can avoid such. >>>>> >>>>> On Fri, Mar 25, 2016 at 1:23 AM, Gihan Anuruddha <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Janaka, >>>>>> >>>>>> We do have event batching at the moment as well. You can configure >>>>>> that in data-agent-config.xml [1]. AFAIU, what we are trying to do here >>>>>> is >>>>>> to combine several events into a single event. Apart from that, wouldn't >>>>>> be a good idea to compress the event after we merge and before we send to >>>>>> DAS? >>>>>> >>>>>> [1] - >>>>>> https://github.com/wso2/carbon-analytics-common/blob/master/features/data-bridge/org.wso2.carbon.databridge.agent.server.feature/src/main/resources/conf/data-agent-config.xml >>>>>> >>>>>> On Fri, Mar 25, 2016 at 11:39 AM, Janaka Ranabahu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Srinath, >>>>>>> >>>>>>> On Fri, Mar 25, 2016 at 11:26 AM, Srinath Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> As per meeting ( Paricipants: Sanjiva, Shankar, Sumedha, Anjana, >>>>>>>> Miyuru, Seshika, Suho, Nirmal, Nuwan) >>>>>>>> >>>>>>>> Currently we generate several events per message from our products. >>>>>>>> For example, when a message hits APIM, following events will be >>>>>>>> generated. >>>>>>>> >>>>>>>> >>>>>>>> 1. One from HTTP level >>>>>>>> 2. 1-2 from authentication and authorization logic >>>>>>>> 3. 1 from Throttling >>>>>>>> 4. 1 for ESB level stats >>>>>>>> 5. 2 for request and response >>>>>>>> >>>>>>>> If APIM is handling 10K TPS, that means DAS is receiving events in >>>>>>>> about 80K TPS. Although data bridge that transfers events are fast, >>>>>>>> writing >>>>>>>> to Disk ( via RDBMS or Hbase) is a problem. We can scale Hbase. >>>>>>>> However, >>>>>>>> that will run to a scenario where APIM deployment will need a very >>>>>>>> large >>>>>>>> deployment of DAS. >>>>>>>> >>>>>>>> We decided to figure out a way to collect all the events and send a >>>>>>>> single event to DAS. Basically idea is to extend the data publisher >>>>>>>> library >>>>>>>> such that user can keep adding readings to the library, and it will >>>>>>>> collect >>>>>>>> the readings and send them over as a single event to the server. >>>>>>>> >>>>>>>> However, some flows might terminated in the middle due to failures. >>>>>>>> There are two solutions. >>>>>>>> >>>>>>>> >>>>>>>> 1. Get the product to call a flush from a finally block >>>>>>>> 2. Get the library to auto flush collected reading every few >>>>>>>> seconds >>>>>>>> >>>>>>>> I feel #2 is simpler. >>>>>>>> >>>>>>>> Do we have any concerns about going to this model? >>>>>>>> >>>>>>>> Suho, Anjana we need to think how to do this with our stream >>>>>>>> definition as we force you to define the streams before hand. >>>>>>>> >>>>>>> Can't we write something similar to JDBC batch processing where the >>>>>>> code would only do a publisher.addBatch() or something similar. The data >>>>>>> publisher can be configured to flush the batched requests to DAS when >>>>>>> they >>>>>>> hit a certain threshold. >>>>>>> >>>>>>> Ex:- We define the batch size as 10(using code or config xml). Then >>>>>>> if we have 5 streams, the publisher would send 5 requests to DAS(for >>>>>>> each >>>>>>> stream) instead of 50. >>>>>>> >>>>>>> IMO, this would allow us to keep the existing stream definitions and >>>>>>> reduce the number of calls from a server to DAS. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> Thanks, >>>>>>> Janaka >>>>>>> >>>>>>>> >>>>>>>> --Srinath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://home.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Architecture mailing list >>>>>>>> [email protected] >>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Janaka Ranabahu* >>>>>>> Associate Technical Lead, WSO2 Inc. >>>>>>> http://wso2.com >>>>>>> >>>>>>> >>>>>>> *E-mail: [email protected] <http://wso2.com>**M: **+94 718370861 >>>>>>> <%2B94%20718370861>* >>>>>>> >>>>>>> Lean . Enterprise . Middleware >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> [email protected] >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> W.G. Gihan Anuruddha >>>>>> Senior Software Engineer | WSO2, Inc. >>>>>> M: +94772272595 >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> Inosh Goonewardena >>>>> Associate Technical Lead- WSO2 Inc. >>>>> Mobile: +94779966317 >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nuwan Dias >>>> >>>> Technical Lead - WSO2, Inc. http://wso2.com >>>> email : [email protected] >>>> Phone : +94 777 775 729 >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> ============================ >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://home.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Sanjiva Weerawarana, Ph.D. > Founder, CEO & Chief Architect; WSO2, Inc.; http://wso2.com/ > email: [email protected]; office: (+1 650 745 4499 | +94 11 214 5345) > x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311 > blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva > Lean . Enterprise . Middleware > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://home.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
