Who can I prod about setting up a UDF repo at MySQL. I think 'they' should
do this ;)

http://lists.mysql.com/community/97

Anyway I am posting this request to 'community' because I still don't know
the appropriate place to post UDF related stuff.

This is anoter (potentially crazy) idea for a UDF that I would find very
usefull in my research...

AGGLOM - Simple agglomerative clustering for MySQL ...

The UDF would work on any NUMBER column, and return the
number of 'clusters' using agglomerative clustering
with a certain threshold as an input.

Agglomerative clustering merges any two numbers that
are within the 'threshold', and replaces those numbers
with the average of the two. The clustering proceedes
smallest 'gap' first, and stops when no two numbers are
within the threshold.

The result would be the number (or perhaps the values) of the
remaining clusters.

Syntax (suggested) 

AGGLOM(THRESH,expr (returning a number))

For example

Table1

C1 C2
A 1
A 2
A 3
A 4
A 5
A 6
A 7
B 10
B 11
B 12
B 56
B 57
B 58
B 99
B 101


SELECT C1, AGGLOM(C2,1) AS C3 FROM Table1 GROUP BY C1;

C1 C3
A 4
B 6


SELECT C1, AGGLOM(C2,2) AS C3 FROM Table1 GROUP BY C1;

C1 C3
A 3
B 3


SELECT C1, AGGLOM(C2,3) AS C3 Table1 GROUP BY C1;

C1 C3
A 2
B 3


SELECT C1, AGGLOM(C2,4) AS C3 Table1 GROUP BY C1;

C1 C3
A 1
B 3


SELECT C1, AGGLOM(C2,50) AS C3 Table1 GROUP BY C1;

C1 C3
A 1
B 1



Remember, merge numbers with the smallest difference
first, and replace each pair with the average of the
two. Recalculate the differences for the new number,
and repeat until no distance is smaller than the threshold.

This is a usefull clustering 'hack' to see if a distribution 
is bi-modal or multi modal for example. It is very quick to
calculate using a hash table, and could be a great
function to add.

Is this idea as crazy as I think it might be?




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to