Data locality in FTPFileSystem and RawLocalFilesystem

Zooni Zooni Fri, 15 Oct 2010 06:41:18 -0700

Hi,

In the case of RawLocalFilesystem or FTPFileSystem being used as input of a
map-red job,
How does the jobtracker apply the data locality logic .i.e How many map
tasks to start and in which machines?


I want to understand this keeping in mind two scenarios,

Scenario 1: RawLocalFileSystem
   - All the data nodes  have a local directory called /fooLocalBar each
having 10 files (each 200MB size) to be processed.

Scenario 2: FTPFileSystem
  - A common external machine has a directory called /fooRemoteBar which has
10 files (each 200MB) to be processed


./zahoor

Data locality in FTPFileSystem and RawLocalFilesystem

Reply via email to