I did something like this, running principal component analysis and a set 
of ~25 million vector distances on a 7000x2000 table.  For performance  - 
and access to CLAPACK - I wrote a C program to do this.

I just had to run the job weekly, so I downloaded the necessary tables 
(heroku db:pull), ran my analysis, and then pushed them back to Heroku.

I considered doing this on AWS - but there was no scenario where this job 
had to be run more than once a day, I had one always-on fast (free) 
computer that could host this task, and the hurdle of figuring out how to 
put it all in the cloud wasn't worth the hassle at the time.

The biggest win for me was optimizing the C dot-product routine - when I 
figured out how to get the computer to use hardware simd properly, I got 
something like a 10x performance improvement.

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/heroku/-/JCS8JklyfJoJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

Reply via email to