Sameer Tilak wrote:
>
> Hi everyone,
> I would like to use Hadoop for analyzing tens of thousands of images.
> Ideally each mapper gets few hundred images to process and I'll have few
> hundred mappers. However, I want the mapper function to run on the machine
> where its images are stored. How can I achieve that. With text data creating
> splits and exploiting locality seems easy.
You can store the image files in hdfs. However storing too many small files in
hdfs
will result in scalability and performance issues. So you can combine multiple
image files
into a sequence file. There are some other approaches also discussed here:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/