Aha.... If you are running SGD on a single node, just open the HDFS files directly. You won't have significant benefit to locality unless the files are relatively small.
With a single node solution, you gain little from Hadoop. The need for restarts and such really provide large advantage when you have many nodes participating in the computation. On Thu, Jan 28, 2010 at 1:37 PM, Markus Weimer <[email protected]> wrote: > It would be > neat if the actual learning could be done on the cluster as well, if > only on a single, carefully chosen node close to the data. > -- Ted Dunning, CTO DeepDyve
