Hi all,

I am currently processing a lot of raw CSV data and producing a
summary text file which I load into mysql.  On top of this I have a
PHP application to generate tiles for google mapping (sample tile:
http://eol-map.gbif.org/php/map/getEolTile.php?tile=0_0_0_13839800).
Here is a (dev server) example of the final map client:
http://eol-map.gbif.org/EOLSpeciesMap.html?taxon_id=13839800 - the
dynamic grids as you zoom are all pre-calculated.

I am considering (for better throughput as maps generate huge request
volumes) pregenerating all my tiles (PNG) and storing them in S3 with
cloudfront.  There will be billions of PNGs produced each at 1-3KB
each.

Could someone please recommend the best place to generate the PNGs and
when to push them to S3 in a MR system?
If I did the PNG generation and upload to S3 in the reduce the same
task on multiple machines will compete with each other right?  Should
I generate the PNGs to a local directory and then on Task success push
the lot up?  I am assuming billions of 1-3KB files on HDFS is not a
good idea.

I will use EC2 for the MR for the time being, but this will be moved
to a local cluster still pushing to S3...

Cheers,

Tim

Reply via email to