in the first version you read a single large chunk while in the second you read many small chunks.
I think reading many small chunks is much slower due to how disk IO works. First it has to look for the data, and second, read only a small chunk while it could read a larger chunk using the same time. I mean, disks (I think SSDs too) are optimized for larger reads. Also reading one chunk would spare the many lookup delays. Seems this IO penalty is large compared to the cost of allocating the data variable...
