Hi, In the case of RawLocalFilesystem or FTPFileSystem being used as input of a map-red job, How does the jobtracker apply the data locality logic .i.e How many map tasks to start and in which machines?
I want to understand this keeping in mind two scenarios, Scenario 1: RawLocalFileSystem - All the data nodes have a local directory called /fooLocalBar each having 10 files (each 200MB size) to be processed. Scenario 2: FTPFileSystem - A common external machine has a directory called /fooRemoteBar which has 10 files (each 200MB) to be processed ./zahoor
