I did something like this, running principal component analysis and a set of ~25 million vector distances on a 7000x2000 table. For performance - and access to CLAPACK - I wrote a C program to do this.
I just had to run the job weekly, so I downloaded the necessary tables (heroku db:pull), ran my analysis, and then pushed them back to Heroku. I considered doing this on AWS - but there was no scenario where this job had to be run more than once a day, I had one always-on fast (free) computer that could host this task, and the hurdle of figuring out how to put it all in the cloud wasn't worth the hassle at the time. The biggest win for me was optimizing the C dot-product routine - when I figured out how to get the computer to use hardware simd properly, I got something like a 10x performance improvement. -- You received this message because you are subscribed to the Google Groups "Heroku" group. To view this discussion on the web visit https://groups.google.com/d/msg/heroku/-/JCS8JklyfJoJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
