Hello,

On Thu, Feb 3, 2011 at 10:46 PM, Keith Wiley <[email protected]> wrote:
> I've seen this asked before, but haven't seen a response yet.
>
> If the input to a streaming job is not actual data splits but simple HDFS 
> file names which are then read by the mappers, then how can data locality be 
> achieved.

Also, if you're only looking to not split the files, you can pass in a
custom FileInputFormat with isSplitable returning false? You'll lose
completeness in locality cause of blocks not present in the chosen
node though, yes -- But I believe that adding a hundred files to
DistributedCache is not the solution, as the DistributedCache data is
set to ALL the nodes AFAIK.

-- 
Harsh J
www.harshj.com

Reply via email to