So, I was sick for a week, before which I'd been a week or so since I'd touched the hadoop work I've been doing. I left the namenode/datanodes running during that time, so that disk failures and whatnot could clean up after themselves.
My current dataset is composed of about 16k files, totalling around 450gb. For the second time now, I've found that files have started to rot. No nodes from the ~20 machine cluster died during the time of I wasn't paying attention to them. Each machine has an average of 3 drives in it, and, after the last time, I turned replication up to 4x, "just in case". Yet, somehow, dozen of files are now missing blocks. They weren't missing blocks before. Has anyone run into this? I can't find any gremlins in the system, especially not that would leave 99% of my data alone but kill all 4 copies on different machines of a few blocks such as to make them disappear from the cluster entirely.... but it's starting to get annoying. -- Bryan A. Pendleton Ph: (877) geek-1-bp
