Le jeudi 20 mars 2014 05:02:33, Zhang, Shuai a écrit : > Hi All, > > I think i need a mentor working with me and help me make gdal under mongodb > support. Below is the proposal i wrote, hopefully you find it worth a > trial.
This is something I may potentially mentor, but there are already 2 students interested on other subjects. I'm not sure how many will get eventually selected by the GSOC program, but I won't be able to mentor 3 people for sure ! > > Thanks, > shuai > > > Title: OGR Driver for MongoDB > > Short description: > MongoDB, a document database that provides high performance, high > availability, and easy scalability, can be a good platform for storing > extremely large spatial datasets, to support high performance > geo-computation and real-time spatial analysis in a large scale.This > project aims at developing a OGR Driver for MongoDB to help applications > or softwares based on GDAL, such QGIS, Geoserver, Mapserver, and so on, > read & write the spatial data in it, and thus enable the Open Source GIS > Ecosystem powered by the advanced NoSQL database. > > Describe your idea > 1. Introduction > MongoDB, a document database that provides high performance, high > availability, and easy scalability, can be a good platform for storing > extremely large spatial datasets, to support high performance > geo-computation and real-time spatial analysis in a large scale. Yet, > there is little attention so far that GIS fields pay to make most of its > strength. This project aims at developing a OGR Driver for MongoDB to help > applications or softwares based on GDAL read & write the spatial data in > it, and thus enable the Open Source GIS Ecosystem powered by the advanced > NoSQL database. > > 2. Background > Since we are living in the era of big data, tools and equipment today for > capturing spatial data both at the mega-scale and the milli-scale are just > dreadful. The magnitude of this data volume is well beyond the capability > of any mainstream geographic information systems. Yet, we, GIS fields, > have no off-the-shelf solutions to manage these massive spatial data. > Relational spatial databases have taken in charge for decades but now the > situation seems a little different. > > A computing pattern shift can be seen throughout the IT industry in recent > years and GIS would be no exception. Especially, data analytics may not be > achievable within a reasonable amount of time without resorting to > high-performance computing strategies. However, relational spatial > databases are kind of slow to support these high-performance computing > scenarios, and often lack of flexible scalability to handle a growing > amount of work in a capable manner. > > Fortunately, there are several groups trying to address the problem, and > MongoDB is an apparent leader in this direction. MongoDB, which has native > support for maintaining geospatial data, using a document-oriented model, > lies in fifth place in the DB-Engines Ranking of database management > systems classed according to popularity and the highest rated > non-relational system. From version 2.4 (released on March 19, 2013), > MongoDB introduces support for a subset of GeoJSON geometries including > basic shapes like points, linestrings, polygons. Good to know. Last time I looked, MongoDB had only support for point geometries. > And quite a number of > partners related with big data, NoSQL, cloud, mobile and high performance > computing join the MongoDB ecosystem. Foursquare is featured one of them > which benefits from MongoDB’s support for geospatial indexing, allowing it > to easily query for large location-based data. > > 3. The idea > MongoDB employs GeoJSON to store spatial data and concurrently GDAL > supports for access to features encoded in GeoJSON format, which can be > reusable. As far as I remember, the interface with MongoDB is (was?) a kind of binary JSON format. Has this changed ? > This project is trying to implement a MongoDB Driver according > to the OGR format driver interfaces with subclasses of OGRSFDriver, > OGRDataSource and OGRLayer, and registered with the OGRSFDriverRegistrar > at runtime, so that GDAL may use MongoDB as a datasource to access large > scale spatial data. > > 4. Project plan (detailed timeline: how do you plan to spend your summer?) > The first thing in the list is to design the structure inside of MongoDB > spatial database. In the context of OGR data model, we got Datasource, > Layer and Feature, so accordingly every database in MongoDB is regarded as > a Datasource, and the Collections within should be treated as Layers, thus > every Document as a Feature. Yes, sounds a bit similar to what was done with CouchDB > PostGIS and other spatial databases often > harness some system tables to maintain the metadata, but since MongoDB is > schema free metadata such as spatial reference can be stored within the > particular Layer, in this case a Collection. > > The most important part of a data format driver is to define how to read > and write the data format in the specific driver, especially the Open and > Create method in the Datasource Class. As MongoDB organizes its spatial > data in GeoJSON model, the GeoJSON driver already supported by current > GDAL can be reused to code or decode the GeoJSON fetched from MongoDB > database. Therefore, there would be totally four files to implement, > including ogr_mongo.h, ogrmongodriver.cpp, ogrmongodatasource.cpp, and > ogrmongolayer.cpp. The write part should be no problem : a no SQL database can receive documents with a fixed structure. The read part will need to explore all the documents/features to retrieve their structure and build a OGR FeatureDefinition. This is done in the CouchDB driver. > > Test Plan > [1] After the MongoDB Driver is compiled into the OGR framework, the > utility ogr2ogr can be used as the test program to import and output > spatial data between shapefile and MongoDB. [2] Conduct a parallel > transformation process to find how fast MongoDB Driver can be compared to > file system and PostGIS. > > Time Line > > May 19- June 8 (Coding - Phase 1 - 3 weeks) > Preparing the developing environment and bringing GDAL, MongoDB C++ driver > and C++ together, Implementing OGRMongoDriver, OGRMongoDataSource, > OGRMongoLayer according to the interfaces defined by OGRSFDriver, > OGRDataSource and OGRLayer. June 9 - June 23 (Coding - Phase 2 - 2 weeks) > Build MongoDB into the OGR framework, and may first support to exchange a > small size of spatial data with MongoDB, Simultaneously bug fixing. July > 24 - July 13 (Coding - Phase 3 - 3 weeks) > Passing the query string (a JSON style document) for both spatial and > attribute data into MongoDB to select features as requested. Compile all > the codes and conduct several tests, fix bugs and make it faster. July 14 > - July 27 (Testing - Phase 1 - 2 weeks) > Transfer large scale spatial data with MongoDB using ogr2ogr to see the > driver efficiency. Improve its efficiency and fix bugs. July 28 - August > 10 (Testing - Phase 2 - 2 weeks) > Conduct a parallel transformation experiment to find how fast MongoDB > Driver can be compared to file system and PostGIS, and fix bugs. August 11 > - August 18 (pencils down) > Write code documentation, includes doxygen comments and techbase/userbase > articles. You could mention adding support for spatial filtering. > > 5. Future ideas / How can your idea be expanded? > MongoDB is also an ideal platform for storing massive geo-raster data, so > next job would be writing a MongoDB Driver for raster dataset. Hum, I'm not sure if MongoDB is aimed at this... You would probably have to tile the raster to avoid sending/retrieving huge blobs at once > > Explain how your SoC task would benefit the OSGeo member project and more > generally the OSGeo Foundation as a whole: MongoDB can be a distributed > and parallel NoSQL spatial database with high performance, high > availability, and easy scalability, thus extremely suitable for large > scale data-intensive computing. By implementing the MongoDB Driver in the > OGR framework, the whole OSGeo ecosystem based on GDAL/OGR will be benefit > from it and powered by MongoDB. > > Please provide details of general computing experience: (operating systems > you use on a day-to-day basis, languages you could write a program in, > hardware, networking experience, etc.) During my college time, I mainly > used .NET languages such as C#,VB.net, to build GIS software running on > the Windows platform, while after that and my PhD program beginning most > of my work were done in standard C++ on Linux environment. > > Please provide details of previous GIS experience: > I’m a GIS student ever since I attend college. Right now I'm a Ph.D > candidate in Cartography and Geographic Information System, School of > Geographic and Oceanographic Sciences, Nanjing University, China, and a > visiting scholar at Geography & GIScience and NCSA (The National Center > for Supercomputing Applications), UIUC, IL, USA. > > Please provide details of any previous involvement with GIS programming and > other software programming: [1] Climate Information Management System of > Shanxi Province: Outstanding Award in ESRI Chinese College Student > Software Development Contest, 2009. [2] Forest Fire Simulation Model based > on Geographic Cellular Automata: Third Prize in ESRI Chinese College > Student Software Development Contest, 2009. [3] High Performance > Geospatial Computing System: HiGIS, (2011-2013)Supported by the National > High Technology Research and Development Program of China (863 project), > in construction. [4] NoSQL Expression of Massive Geospatial Information in > the era of Big Data, (2013-2015) Supported by the Scientific Research > Foundation of Graduate School of Nanjing University, in construction > > Please tell us why you are interested in GIS and open source software: > They are powerful and beautiful treasures of humankind, and I want to be > part of it. > > Please tell us why you are interested in working for OSGeo and the software > project you have selected: It’s part of my research, since I was trying to > harness MongoDB to support high performance geo-computing. > > Please tell us why you are interested in your specific coding project: > I spent lots of time in the past three years learning how GDAL works and > how to employ them into high performance computing applications. So I > believe a new GDAL with MongoDB support will do much good to my current > research. > > Would your application contribute to your ongoing studies/ degree? If so, > how? Yes. MongoDB cluster is a good way to handle large quantities of > spatial data, and if OGR provides MongoDB Driver, lots of tools we > developed based on GDAL can be reusable, and powered by MongoDB, thus much > faster. > > Please explain how you intend to continue being an active member of your > project and/or OSGeo AFTER the summer is over: I’ll try my best to keep > following this thread to make MongoDB Driver stable and efficient. > > Do you understand this is a serious commitment, equivalent to a full-time > paid summer internship or summer job? Yes, I understand. I’ll give my > best. > > Do you have any known time conflicts during the official coding period? > (May 19 to August 19) No, I don't. -- Geospatial professional services http://even.rouault.free.fr/services.html _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
