if it really is trying to tell the difference between a map and a photograph could you make a decision based on the presence of text, and therefore use an OCR mechanism to judge if there are more than x words found in the image
-i De : [email protected] [mailto:[email protected]] De la part de Dmitriy Baryshnikov Envoyé : Thursday, 6 March 2014 21:16 À : [email protected] Objet : Re: [gdal-dev] Heuristics to classify raster data ? Hi Even, most of all depends what kind of imagery and maps you wish to classify. If the maps are classical scanned paper maps, and you want fast algorithm - the crosses of meter or degree grid can be the good pattern. But if we have areal images this will not work, as such images have crosses too. But satellites - not. May be some frame of maps can be good pattern. If you have some fragment of maps and images, I think some content analysis needed: - clustering, i.e. http://en.wikipedia.org/wiki/K-means_clustering - Neural network with learning - Support vector machine i.e. http://svmlight.joachims.org/ and http://en.wikipedia.org/wiki/Support_vector_machine Also some hash comparison can be used (rather fast) - perceptual hash compare i.e. http://www.phash.org/ In all cases input images should be resized to some small sizes and may be grayscaled or binarized before analysis. Best regards, Dmitry 06.03.2014 23:19, Even Rouault пишет: Hi, I'd be interested in an algorithm to automate the classification of raster data between maps (let's say rendering of OpenStreetMap data, or other digital maps) one one side and aerial/satellite imagery on the other side, without looking at metadata (bare geotiff typically). This is to help in automating bulk of import of data from a media and establishing a first level of classification. Has anyone already done that and has code and/or advice to share, or know a software project that would do that ? Some ideas that came to my mind : - maps have typically a much more reduce number of colors than imagery, but you may have imagery that has already been transformed to 256 colors to reduce storage space. - maps have generally a majority color (e.g. white, green), but not in all zones (urban zones will have more features) - maps have higher spatial frequency (lines, text) whereas imagery will be more continuous : use of gradient, and compute statistics on it ? Even
_______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
