On Mon, Aug 18, 2014 at 10:31 AM, J. Roeleveld <jo...@antarean.org> wrote: > > I wouldn't use Hadoop for storage of files. It's only useful if you have a lot > (and I do mean a LOT) of data where a query only returns a very small amount.
Not to mention a lot of data in a small number of files. I think the minimum allocation size for Hadoop is measured in megabytes. I tried using it to process gentoo-x86 and the number of files just clobbered the thing. Since in my job the files were really just static data and not the actual subject of the map/reduce I instead just replicated the data to all the nodes and had them retrieve the data from the local filesystem. Hadoop is a very specialized tool. It does what it does very well, but if you want to use it for something other than map/reduce then consider carefully whether it is the right tool for the job. -- Rich