Hi All,

There was a thread recently about test data for Drizzle and while there are
lots of sample data options, I was thinking about what data could actually
serve the Drizzle community with valuable information.
I'd like to propose we create a simple model to record
client/server/instance data and volumes of Drizzle and MySQL compatible
environments.

The reason for considering this is two fold.

   - First it's extremely easy information to generate and automate, having
   machine generated content over user generated content is far easier to
   scale.
   - Second it can provide some interesting output for Drizzle stats, e.g.
   what versions are used, what volume of data, some status variable usages
   etc.

Let me start by saying I'm not advocating that you store your MySQL/Drizzle
status variables in tables for generally monitoring in a production
environment.

*
Logical Data Model*

A high level quick analysis

Client   (Id,EmailMD5,Token)  - We enable an anonymous approach so people
will never actually know the clients
Instance   (InstanceId, ClientId, product, version, OS, serverAttributes,
geoAttributes)
Status   (InstanceId,Date/Time,name,value)
Variables (InstanceId, Date/Time, name, value)
Attributes (InstanceId, Date/Time, name, value) - A generic bucket for other
important figures including installed storage engines/plugins, number of
schemas/tables/procs/functions/triggers etc)

Volume (InstanceId, Date/Time, schemas,tables,total_volume,largest_table
etc)  - Some general and optional metrics of db size

There is obviously much more that can be considered such as Server for
multiple Instance environments, historical instance changes changes such as
version upgrades/downgrades etc (initially it would be more a dumb match).
The first goal is not to be perfect but part of continual improvement.

*Data Acquisition*

>From Drizzle and MySQL 5.1 we can obtain the data via SQL statements.
Pre MySQL 51, we can obtain via mysqladmin and load scripts.
I'd like to see how we can use Gearman in some interesting way as a
collection agent.
*
Example SQL*

   - Product/Version Counts (for graphing)
   - Distribution of server uptimes
   - Building summary reporting tables

*
Your Input*

While I consider the design of version 1 of tables will take only a few
hours I'd like to know if people would consider this an interesting example
to pursue.
There is also opportunity for others to contribute to data acquistion
SQL/scripts, example output, even UI.
Give the very simple model we can also consider what sharding of data you
may consider for a more cloud based solution.
Several years ago I actually started on a related product, called
DBCollation.org.  My goal was to build statistics about MySQL instances
world wide, so we could produce some interesting statistics/graphs etc of
usage of MySQL.

I think it would be great on Drizzle.org to see some actual stats of Drizzle
systems.  Granted initially it may be lame in numbers/volumes and perhaps
needs to be more private/internal, it enables participation.


Regards

Ronald
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to