To: Casey Hawthorne <cas...@istar.ca> Subject: Re: [Haskell-cafe] looking for a good algorithm From: Casey Hawthorne <cas...@istar.ca> Date: Thu, 12 Nov 2009 11:14:02 -0800
On third thought, convert the table to a 2D array of bits (or a 1D array of bits mapped to a 2D coordinate system). The bit is true for either an Int or Double. If space is a concern and one can tolerate a low rate of false positives, than a 2D Bloom Filter might be appropriate, or a 1D Bloom Filter with mapping to 2D coordinates. http://en.wikipedia.org/wiki/Bloom_filter Then on to simmulated annealing, possibly. Note: Donald Knuth's new fascicle is out: Volume 4 Fascicle 1, Bitwise Tricks & Techniques; Binary Decision Diagrams (2009), xiii+261pp. ISBN 0-321-58050-8 On Wed, 11 Nov 2009 15:32:35 -0800, you wrote: >So, as I understand it, you have a very large sparse table, thousands >of rows and hundreds of columns, of which each cell within a column of >type String, Int, or Double can contain one of those types or nothing. > >Then you to want to shuffle the rows to maximize the number of columns >whose first 100 rows have at least one number (Int or Double), given a >list of preferred column names since there is no guarantee that every >number column will have at least one number in its first 100 rows >after shuffling. > > >I'm wondering about hashing on the rows and hashing on the columns, >then the column hash has the number of Int's or Double's (don't need >the String's) in that column and the rows they are in. > >The row hash would have the number of Int's and Double's in that row >and what column's they are in. > >Then; > >Then scan the row hash and sort into descending order, and by tagging >those rows, not by actually moving them. > >Then I think your ready for simmulated annealing. -- Regards, Casey _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe