Hi Malintha,

Can't we have a single stream_definition to which each API call event is
published to. We can have an entry in the definition to indicate the status
of the API call to be either "SUCCESS" or "THROTTLED" (or may be a boolean).

So we will be writing our summarising scripts against a single column
family to get the count of SUCCESS or THROTTLED events.

However, if the two column families that you had mentioned above is
required for any other statistical measures then we might have to retain
those two separately also.

Regards


-------------------------------------
*Shabir Mohamed*
*Software Engineer*
WSO2 Inc.; http://wso2.com
Email: [email protected] <[email protected]>
Mobile: +94 77 3516019 | +94 71 6583393

On Thu, Jul 30, 2015 at 11:02 AM, Malintha Amarasinghe <[email protected]>
wrote:

> Hi all,
>
> For a recent requirement, we need to analyse the number of successful and
> throttled out requests per API over time to visualize them in APIM
> statistics graphs.
>
> Ex:
>
> Time Range (per hour)
>
> API
>
> Number of successful requests
>
> Number of Throttled out requests
>
> 1.00 pm - 2.00 pm
>
> api1:v1.0.0
>
> 10
>
> 24
>
> 2.00 pm - 3.00 pm
>
> api1:v1.0.0
>
> 11
>
> 18
>
> ...
>
> ...
>
> ...
>
> [Table 1]
>
> Currently In API-M we have a set of cassandra column families stored in
> BAM which are used for analytics. Following are 2 of them that are useful
> for the above requirement.
>
>
>    1.
>
>    org_wso2_apimgt_statistics_request
>    2.
>
>    org_wso2_apimgt_statistics_throttle
>
> When an API call that comes to APIM is throttled out, an event is
> published to *org_wso2_apimgt_statistics_throttle *stream per each call.
> Likewise per each successful API call, an event is published to
> *org_wso2_apimgt_statistics_request* stream.
>
>
> The problem is, to derive the above table [Table 1] using both above
> cassandra column families.
>
>
> Currently it is done in the following way.
>
>
>    1.
>
>    Using the data in *org_wso2_apimgt_statistics_request* we can
>    summerize data per hour based and derive the following MySQL table.
>
>
> Time Range (per hour)
>
> API
>
> Number of successful requests
>
> 1.00 pm - 2.00 pm
>
> api1:v1.0.0
>
> 10
>
> 2.00 pm - 3.00 pm
>
> api1:v1.0.0
>
> 11
>
> ...
>
> ...
>
> [Table 2]
>
>
>    1.
>
>    Likewise we can derive the same for the
>    *org_wso2_apimgt_statistics_request* column family.
>
>
> Time Range (per hour)
>
> API
>
> Number of Throttled out requests
>
> 1.00 pm - 2.00 pm
>
> api1:v1.0.0
>
> 24
>
> 2.00 pm - 3.00 pm
>
> api1:v1.0.0
>
> 18
>
> ...
>
> ...
>
> [Table 3]
>
>
>    1.
>
>    The two tables (Table 2 & 3) are joined in MySQL level to derive the
>    above [Table 1].
>
>
> But the problem with this is, two (still) large tables (when comes over
> 20s of APIs and months of data) are joined in MySQL level (using a full
> join) which might be less efficient.
>
> Are there any efficient ways do the above 3 steps in Hive queries and
> produce the final joined table [Table 1]?
>
> Although JOINs are supported in Hive too, I think that would not be a good
> solution as that will try to join an even more large tables in Hive level.
>
> Any suggestions are much appreciated.
>
> Thank you.
>
> --
> Malintha Amarasinghe
> Software Engineer
> *WSO2, Inc. - lean | enterprise | middleware*
> http://wso2.com/
>
> Mobile : +94 712383306
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to