Hey guys, I'm new on the list and I'm currently considering Hadoop to solve a data distribution problem. Right now, there's a server which contains very large files (usual files have 30GB or even more). This server is accessed through LAN and over the internet but, of course, it's difficult to do this without local connection.
My idea to solve this problem is to deploy new servers on the places which access data more often in an such a way that they get a local copy of the files most accessed by then. These new servers would download and store parts of the data (entire files) so that they can be accessed through their own LAN alone, without needing to relieve on another server's data. Is it possible to have this kind of limitation when splitting a file through Hadoop's nodes? In reality I don't know even if this restriction is useful. In my head, enforcing this kind of data locality would make possible to use data internally even if there is no internet connection, at the price of limiting the number of nodes to balance the load and replication possibilities. Is this tradeoff acceptable or at least possible with Hadoop? thanks, Thiago Moraes - EnC 07 - UFSCar
