Piotr, Mark and Jim have provided some sound advice. As to your approach, I want to offer a data point.
Four mid-range Intel Haswell processors connected by GbE can provide almost 500 GFLOPS of DP CPU performance running HPL (the Top500 benchmark) without a GPU. http://limulus.basement-supercomputing.com/wiki/LimulusBenchmarks Although you can buy a prebuilt box (disclosure, I sell them), it is not that difficult to build you own system using separate cases and some shelves. Or as Jim suggests, cookie sheets and double stick tape. -- Doug > Hi all, > I'm considering proof of concept Beowulf cluster build for machine > learning purposes. My main requirements are that it should based on > embedded development boards (relatively small power consumption and > size). In short I need as good as possible double precision matrix > multiplication performance with small power consumption and size. > > Taking matrix multiplication into consideration I thought that GPU is > natural choice. Best fit in this category that I was able to find is > brand new Jetson TK1 from NVIDIA: > > https://developer.nvidia.com/jetson-tk1 > > If I missed something then please let me know. I don't have access to > code to benchmark memory and cpu consumption for these algorithms. I'm > responsible for providing hardware and system configuration, but I'm > curious about your professional opinion on this build. > > Questions that already came to my mind: > 1. What are the most used diagnostic software for keeping cluster up and > running. Is it something that I should incorporate from outside of > standard distro (like Debian/Ubuntu) repository for this kind of build ? > Or maybe standard tools are enough ? > > 2. Boards got size 5"x5" (12.7cmx12.7cm) I wonder where to find > chassis/open air frame for 16, 32 or 64 nodes if I will have to extend > my build. If you have any proposition I would be glad to hear about it. > > 3. I'm not electrical engineer but I wonder if there could be problem > with powering up 32/64 nodes at once. There are no wattage > characterization data for this board right now, but I saw some > informations that this board should be sub-10W. > > 4. Theoretical max for this platform is 326 SP GFLOPS, I was able to > confirm that DP/SP ratio is 1/24 so theoretical max for DP is 13 GFLOPS. > Can someone elaborate or point me to documentation how hard will be to > utilize this power assuming CUDA and MPI usage. > > 5. Operating system reside on eMMC, are there any reasons to switch to > SD card or SSD disk (there is a SATA port on board) ? > > This was my first post to this list, so please excuse me if I introduced > some confusion. > > I'm open to any suggestions, even if it means changing everything in > this build :) > > Best Regards, > Piotr Król > > > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Mailscanner: Clean > > -- Doug -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
