So we were storing the a hadoop.job.history.user.location (attempt_blah) files on local disk on each node. We keep them around for about a week. We have had to reduce this to 1 day, because as the number of files in that directory increases, eventually jobs fail to run on that machine til I clear/move the logs out. I am guessing that this is a glob failure.
