> Has any publications been done in this area? (The spatial processing on > hadoop) > G Sudha Sadasivam
I saw Chris who does Cascading tweet about someone building RTree indexes recently in hadoop and wanted to follow up with him about who was doing that. For my part I very hastily wrote a blog (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html) but really I did not get far. I could imagine running some initial MR job on each side of a huge join before actually doing the join, to determine the best join order, and potentially a (spatial) partitioning strategy (e.g. build some RTrees, or perhaps subset the data for different areas of the world and do the join in multiple jobs). Then using the output of this analysis stage to actually run the process / implement the join. There are some nice species observation and specimen data (100s millions of point based data) that we are often looking to join with polygon datasets (e.g. protected areas of the world etc) - if you wanted a real world dataset and fancied doing something worthwhile (helping protect and understand species) it could be arranged. Cheers, Tim > > --- On Sat, 10/17/09, Siddu <[email protected]> wrote: > > > From: Siddu <[email protected]> > Subject: Re: Project ideas ! > To: [email protected] > Date: Saturday, October 17, 2009, 3:29 PM > > > On Wed, Oct 14, 2009 at 5:05 PM, tim robertson > <[email protected]>wrote: > >> I am interested to see more spatial processing carried out on hadoop. >> > I have done basic spatial joins intersecting 100s millions of points > with 100s thousands of polygons but this is all. It's something I'd > like to spend time researching, but don't have that time... could be a > nice piece of research since everybody loves maps. > > yes tim that sounds interesting ... do u ave any link of urs detailing the > work ? > would be happy to go through it ! > > Cheers, > Tim > > > > > > > On Wed, Oct 14, 2009 at 1:20 PM, sudha sadhasivam > <[email protected]> wrote: >> Some of the projects include: >> 1) Categorise URLS based on domains >> 2) Content based searching >> 3) P2P information retrieval >> 4) Performance enhancements in map-reduce. >> 5) Sort and shuffle optimisations in MR framework. >> 6) Enhancements of scheduling strategies in hadoop >> 7) Document classification >> 8) Document Ranking >> >> Infact all batch applications that can be parallelised are suitable for > hadoop. >> G Sudha Sadasivam >> >> >> >> --- On Wed, 10/14/09, Siddu <[email protected]> wrote: >> >> >> From: Siddu <[email protected]> >> Subject: Project ideas ! >> To: [email protected] >> Cc: [email protected] >> Date: Wednesday, October 14, 2009, 3:38 PM >> >> >> Hello Hadoop Users, >> >> Me and another friend of mine are looking out for some of the project > ideas >> based on hadoop >> >> as a part of our curriculum . >> >> >> Can you give us some pointers please >> >> >> Thanks in advance ! >> >> Regards, >> ~Sid~ >> >> >> >> > > > > -- > Regards, > ~Sid~ > I have never met a man so ignorant that i couldn't learn something from him > > > >
