Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 06:45:42 PM: > On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote: > > >Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM: > > > >> On Thu, 7 Apr 2005, Sean Nolan wrote: > >> > >> >I think you'll find you can do what you want with a cross join. A cross > >join > >> >will join every row from the first table with every row from the second > > > >> >table. It will not randomly do this, so you'd have to be creative in
> >> >"randomly" selecting rows in the WHERE clause. Here's how you could do > >this > >> >with your data, pardon the highly original and very scientific > >> >randomization, but it is just an example :-) > >> > >> Perhaps I don't understand your "randomization" (because I really don't > >> understand it ;), but I don't think a CROSS join does the trick, because > >I > >> want to randomly pick the same rows from either side of the join that I > >> would have gotten with an INNER JOIN using the 1:1 primary key between > >the > >> two tables. This is analagous to sampling without replacement. If I do a > >> cross join and then just restrict the number of rows returned (hey! I > >just > >> worked out what you are doing below) I don't necessarily get the same > >> 'marginals' (or to randomly pick the same rows from either side of the > >> join) . This is analagous to sampling with replacement. > >> > >> Do you see what I mean? > >> > >> > >> > >> > >> > > >> >SELECT > >> > Dept, > >> > Gender, > >> > AVG(Spending) > >> >FROM > >> > table_one a > >> >CROSS JOIN > >> > table_two b > >> >WHERE (a.Person * b.Person) % 4 = 3 > >> >GROUP BY > >> > Dept, Gender; > >> > > >> >Sean > >> > > >> >----- Original Message ------ > >> > > >> >Hi, > >> > > >> >I have a request for a UDF that I would find really useful. I am > >> >duplicating this question on the community list as well as the MySQL > >list, > >> >because I am not sure where best to make this kind of request (see the > >> >previous post http://lists.mysql.com/community/97). > >> > > >> >I think the following syntax would be really cool... > >> > > >> >SELECT > >> > a.*, > >> > b.* > >> >FROM > >> > table_one a > >> >RANDOM JOIN -- <-- :) > >> > table_two b > >> >USING > >> > (ID) > >> >; > >> > > >> >Lets say table_one and table_two have a primary key called ID in common > >> >(although of course that isn't necessary). > >> > > >> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN, > >> >then scramble the 'links' (or rows) between the two tables. The result > >> >would be the same number of rows overall, the same number of distinct > >> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly > >> >associated with the marginals given by the correct inner join. > >> > > >> >Hopefully that makes sense. > >> > > >> >I think this function would be really useful for statistical analysis > >of > >> >scientific data within MySQL (using randomized versions of the > >> >associations within the tables). > >> > > >> >Not sure if the above syntax is optimal, because I would like to be > >able > >> >to do things like this... > >> > > >> >table_one > >> >Dept Person Gender > >> >A 1 M > >> >A 2 F > >> >A 3 M > >> >B 4 F > >> >B 5 M > >> >B 6 F > >> > > >> >table_two > >> >Dept Person Spending > >> >A 1 10 > >> >A 2 20 > >> >A 3 30 > >> >B 4 40 > >> >B 5 50 > >> >B 6 60 > >> > > >> >SELECT > >> > Dept, > >> > Gender, > >> > AVG(Spending) > >> >FROM > >> > table_one a > >> >INNER JOIN > >> > table_two b > >> >USING > >> > (Dept,Person) > >> >GROUP BY > >> > Dept, Gender > >> >; > >> > > >> > > >> >With the above query (which I hope is sensible) I would like to keep > >the > >> >departments fixed, and randomize the genders of the people in the > >> >departments (keeping the number of each sex in each department the > >same). > >> > > >> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described > >> >above) using Person - all in one 'JOIN'. All else being the same this > >> >should randomize the Gender, but keep the marginals. > >> > > >> >I guess this is overly complex given the problem, and it actually > >raises > >> >more questions instantly (about statistical analysis), but the function > >is > >> >basic (I hope), and like all SQL it is the simplicity that allows you > >to > >> >build complex and robust statements. > >> > > >> >Does the random join make any sense? Is this UDF material? > >> > > >> >Any feedback is welcome, > >> > > >> >Dan. > >> > > >> >Thread > >> >UDF request? - Dan Bolser, April 1 2005 12:10am > >> > > >> >_________________________________________________________________ > >> >Don?t just search. Find. Check out the new MSN Search! > >> >http://search.msn.click-url.com/go/onm00200636ave/direct/01/ > >> > > >> > > >> > > >> > >I think I get it. It sounds to me like you want to take a random sampling > >(possibly to include the entire set) and analyze each sample. You can > >return the results of a query in random order if you use the RAND() > >function: > > > >http://dev.mysql.com/doc/mysql/en/mathematical-functions.html > > > ><quote from site> > >As of MySQL 3.23, you can retrieve rows in random order like this: > > > >mysql> SELECT * FROM tbl_name ORDER BY RAND(); > > > >ORDER BY RAND() combined with LIMIT is useful for selecting a random > >sample of a set of rows: > > > >mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d > > -> ORDER BY RAND() LIMIT 1000; > ><end quote> > > > >So if you want to randomly pick 1900 rows of data from a query you would > >say (you need to fill in the blanks with your query): > > > >SELECT ... > >FROM ... > >WHERE ... > >ORDER BY RAND() > >LIMIT 1900; > > > >As to randomly matching rows from one table with corresponding rows from a > >second table, that's just an illusion. For any two tables JOINed together, > >the same row combinations will exist regardless of order or sequence. > > > >TABLE A > >1 red > >2 blue > >3 green > > > >Table B > >1 ball > >2 horse > >3 wagon > > > >SELECT CONCAT(a.color,' ', b.item) > >FROM tablea a > >INNER JOIN tableb b > >ORDER BY RAND(); > > > >I just made a cross product (Cartesian product/cross join) of TableA with > >TableB. There will be 9 rows in some random order (as determined by the > >engine). However, if we sort the results (outside of MySQL) we would > >always get the same list in the same order because there would always be 3 > >red items (red ball, red horse, red wagon) and 3 blue items (blue ball, > >blue horse, blue wagon) and 3 green items (green ball, green horse, green > >wagon). > > > >However if you do not use the entire result set (you apply a LIMIT clause) > >so that you leave out some of the combinations, I think you achieve the > >results you are looking for. Extending this to our sample data, if I LIMIT > >the results to just 6 rows I get many different possible result sets (how > >many unique lists exist of 9 things when taken 6 at a time? - I would do > >the math but I can't remember which formula to use....). One set may have > >no red things while another could have 2 reds, 2 greens, and 2 blues and > >another 3 reds, 2 greens, and a blue. > > n choose k (for combinations), fact(9)/(fact(6)*fact(3))... > > > >Is this the randomizing effect you were going for? > > I am still trying to work out what I do want.... > > OK, I think I worked it out, I want to *scramble* a list of pairs. > > Table1 > A B > 1 10 > 2 20 > 3 30 > > select scramble(A,B) from Table1; > A_scram B_scram > 1 30 > 2 10 > 3 20 > > (as an example, naturally the results could differ each time the query is > run) > > The problem with just limiting the cartesian product is that it dosn't > guarante that every value of A and every value of B is represented the > correct number of times. > > Here is another example... > > A B > 1 10 > 1 20 > 1 30 > 2 20 > 3 30 > > select scramble(A,B) from Table1; > A_scram B_scram > 1 30 > 1 20 > 1 30 > 2 10 > 3 20 > > (for example). > > How can I do this with SQL? Could a SCRAMBLE(A,B) UDF be a reasonable > request? > > Cheers, > Dan. > > > > > >Shawn Green > >Database Administrator > >Unimin Corporation - Spruce Pine > > > I used a method similar to this to help someone line up payment transactions so that they could detect the unmatched payments and the unmatched billings (he was straightening out some old accounting records) First you create a table (or a temporary table, your choice) that looks something like this CREATE TEMPORARY TABLE tmpScramble ( Acolumn int not null , Bcolumn int not null , UNIQUE KEY (Acolumn) , UNIQUE KEY (Bcolumn) ) Then you populate the table with a randomly sequenced Cartesian join of your source data. INSERT IGNORE tmpScramble(Acolumn, Bcolumn) SELECT t1.A, t2.B FROM table1 t1 INNER JOIN table2 t2 ORDER BY RAND(); This will produce a random combination of the rows from table1 and the rows from table2. However because of the UNIQUE keys on tmpScramble the unique values from each table can only appear once in tmpScramble. The IGNORE keyword on the INSERT statement means that duplicate key value errors are ignored which allows the table to fully populate. Basically, you get the first row that contains unique values of both A and B as they appears in the random sort of their combination. All other rows that contain a value of A or B that you have already seen are silently rejected. Now, if you have data that looks like table1 --- 1 1 1 2 2 3 4 This techinique won't work because the 2nd and later instances of each value will not appear in tblScramble. In this case, you don't select the data values from table1 but your primary key values and add them to tmpScramble. That way you are assured that each row from each table is represented at least once. To get back to your data values, JOIN tmpScramble back to table1 or table2 as appropriate to resolve the primary key values back into data values. If doesn't make sense or if it does and you would still like some direct help, you can post your actual table structures and I or someone else can help to write the query to produce some randomly scrambled pairs. Shawn Green Database Administrator Unimin Corporation - Spruce Pine