Hi,
this seems like an FAQ but I didn't explicitly see it in the docs: Is the minmium size a file occupies on HDFS controlled by the block size, i.e. would using the default block size of 64 B result in consumption of 64 MB if I stored a file of 1 byte? I would assume yes based on the fact that HDFS was primarily developed for the distribution of large files and sentences like "Internally, a file is split into one or more blocks and these blocks are stored in a set of Datanodes" seem to imply it but I can't find a definite answer. If so and the default block size is 64MB it feels like abusing HDFS if we set the block size to 10k just because a few hundred thousand files of ours are only a few K in size but I should probably run some benchmarks.
Thanks in advance, Robert
