30M samples, 42 planes, 19x19 chars/plane, plus database overhead is 490 GB. In a dual boot machine that had windows on it originally, Windows wants to keep half of its original partition. I didn’t want to reinstall windows after formatting, so I have a 1 TB for Linux.
However, AlphaGo used data augmentation (rotations and reflections), which will increase the input size to about 4 TB. The input bandwidth is pretty low, and an external 8 TB USB drive will hold it all (about $250). I’d rather just buy another drive than spend time coding and debugging another Caffe input layer to further compress the inputs. Regards, David From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Álvaro Begué Sent: Wednesday, April 27, 2016 1:56 AM To: computer-go Subject: Re: [Computer-go] Machine for Deep Neural Net training What are you doing that uses so much disk space? An extremely naive computation of required space for what you are doing is: 30M samples * (42 input planes + 1 output plane)/sample * 19*19 floats/plane * 4 bytes/float = 1.7 TB So that's cutting it close, But I think the inputs and outputs are all binary, which allows a factor of 32 compression right there, and you might be using constant planes for some inputs, and if the output is a move it fits in 9 bits... Álvaro. On Wed, Apr 27, 2016 at 12:55 AM, David Fotland <fotl...@smart-games.com> wrote: I have my deep neural net training setup working, and it's working so well I want to share. I already had Caffe running on my desktop machine (4 core i7) without a GPU, with inputs similar to AlphaGo generated by Many Faces into an LMDB database. I trained a few small nets for a day each to get some feel for it. I bought an Alienware Area 51 from Dell, with two GTX 980 TI GPUs, 16 GB of memory, and 2 TB of disk. I set it up to dual boot Ubuntu 14.04, which made it trivial to get the latest caffe up and running with CUDNN. 2 TB of disk is not enough. I'll have to add another drive. I expected something like 20x speedup on training, but I was shocked by what I actually got. On my desktop, the Caffe MNIST sample took 27 minutes to complete. On the new machine it was 22 seconds. 73x faster. My simple network has 42 input planes, and 4 layers of 48 filters each. Training runs about 100x faster on the Alienware. Training 100k Caffe iterations (batches) of 50 positions takes 13 minutes, rather than almost a full day on my desktop. David _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go