Hi,

In the case of RawLocalFilesystem or FTPFileSystem being used as input of a
map-red job,
How does the jobtracker apply the data locality logic .i.e How many map
tasks to start and in which machines?

I want to understand this keeping in mind two scenarios,

Scenario 1: RawLocalFileSystem
   - All the data nodes  have a local directory called /fooLocalBar each
having 10 files (each 200MB size) to be processed.

Scenario 2: FTPFileSystem
  - A common external machine has a directory called /fooRemoteBar which has
10 files (each 200MB) to be processed


./zahoor

Reply via email to