Hi David, Thanks for the response. I'll feed your question about converting the shapefile to geojson back to the team.
In the meantime, I have also received some more info on your previous questions: "The input file was 1.4GB, the output geojson was around 17GB IIRC. The raster file contains a UK Flood River model, it works on a 5 metre grid with pixels representing the associated flood risk at a given point on the map. The data is continuous in that a long river would have a band of colour / risk which would follow its course and this could run for miles. Vectorising this could result in a very large long vector with extremely complex geometry (imagine trying to vectorise the whole of the Thames for example)." Many thanks, Chris On 12 January 2015 at 16:04, David Strip <[email protected]> wrote: > Your team writes that the image is usually exported as a vector file, eg > shapefile. Can they do this successfully for the 1.4GB image? If so, > have you tried just converting the shapefile to geojson? Might be the > simplest solution. > > If that doesn't work, you could try tiling, as you mention. As Even has > already noted, the challenge to threading the code is rejoining the > polygons at the boundaries. It's not an overwhelming problem, but it is > a challenge and requires buffering the output rather than streaming it. > > You could do a poor-man's version of multi-threading. > 1. Tile your input image. I would probably try something bigger than > 1024x1024 that your mention. Perhaps 4K x 4K, maybe 8K x 8K. Overlap the > tiles by a pixel or two on all edges. > For the initial experiment just a couple of adjacent tiles are > sufficient. > 2. Feed each tile to gdal_polygonize in as many processes as you have > available processors. > 3. Take the resulting polygon files and merge them into a single > shapefile (or other equivalent format). You can do this with ogr2ogr or > in qgis > 4. Dissolve using the classification value > 5. Split multipart polygons to single polygons. > > I don't know anything about how the dissolve algorithm is written, so I > can't predict it's performance and how it will scale with image size and > number of tiles. However, if it takes advantage of spatial indices, it > could scale fairly well unless you have shapes (like roads) that tend to > stretch from one tile boundary to the next. > > On 1/12/2015 3:07 AM, chris snow wrote: >> Hi David, >> >> Thanks for your response. I have a little more information since >> feeding your response to the project team: >> >> "The tif file is around 1.4GB as you noted and the data is similar to >> that of the result of an image classification where each pixel value >> is in a range between (say) 1-5. After a classification this image is >> usually exported as a vector file (EVF of Shapefile) but in this case >> we want to use geojson. This has taken both Mark and myself weeks to >> complete with gdal_polygonize as you noted. >> >> I think an obvious way to speed this up would be threading by breaking >> the tiff file in tiles (say 1024x1024) and spreading these over the >> available cores, then there would need to be a way to dissolve the >> tile boundaries to complete the polygons as we would not want obvious >> tile lines." >> >> Does this help? >> >> > _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
