[Dev] [APIM][BAM] Problem with producing a summerized table using a combination of two cassandra column families

Malintha Amarasinghe Wed, 29 Jul 2015 22:34:09 -0700

Hi all,

For a recent requirement, we need to analyse the number of successful and
throttled out requests per API over time to visualize them in APIM
statistics graphs.


Ex:

Time Range (per hour)

API

Number of successful requests

Number of Throttled out requests

1.00 pm - 2.00 pm

api1:v1.0.0

10

24

2.00 pm - 3.00 pm

api1:v1.0.0

11

18

...

...

...

[Table 1]

Currently In API-M we have a set of cassandra column families stored in BAM
which are used for analytics. Following are 2 of them that are useful for
the above requirement.


   1.

   org_wso2_apimgt_statistics_request
   2.

   org_wso2_apimgt_statistics_throttle

When an API call that comes to APIM is throttled out, an event is published
to *org_wso2_apimgt_statistics_throttle *stream per each call. Likewise per
each successful API call, an event is published to
*org_wso2_apimgt_statistics_request* stream.


The problem is, to derive the above table [Table 1] using both above
cassandra column families.


Currently it is done in the following way.


   1.

   Using the data in *org_wso2_apimgt_statistics_request* we can summerize
   data per hour based and derive the following MySQL table.


Time Range (per hour)

API

Number of successful requests

1.00 pm - 2.00 pm

api1:v1.0.0

10

2.00 pm - 3.00 pm

api1:v1.0.0

11

...

...

[Table 2]


   1.

   Likewise we can derive the same for the
   *org_wso2_apimgt_statistics_request* column family.


Time Range (per hour)

API

Number of Throttled out requests

1.00 pm - 2.00 pm

api1:v1.0.0

24

2.00 pm - 3.00 pm

api1:v1.0.0

18

...

...

[Table 3]


   1.

   The two tables (Table 2 & 3) are joined in MySQL level to derive the
   above [Table 1].


But the problem with this is, two (still) large tables (when comes over 20s
of APIs and months of data) are joined in MySQL level (using a full join)
which might be less efficient.

Are there any efficient ways do the above 3 steps in Hive queries and
produce the final joined table [Table 1]?

Although JOINs are supported in Hive too, I think that would not be a good
solution as that will try to join an even more large tables in Hive level.

Any suggestions are much appreciated.

Thank you.

-- 
Malintha Amarasinghe
Software Engineer
*WSO2, Inc. - lean | enterprise | middleware*
http://wso2.com/

Mobile : +94 712383306

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

[Dev] [APIM][BAM] Problem with producing a summerized table using a combination of two cassandra column families

Reply via email to