Hello Even,

I've had a chance to test the fix in trunk and can report that it works very well: the `gdalbuildvrt` completed in just over an hour with the progress meter giving a much more accurate report on progress.

I have submitted an enhancement request regarding the VRT indexing at <http://trac.osgeo.org/gdal/ticket/5762>.

Many thanks and best regards,

Homme

On 03/12/14 10:31, Homme Zwaagstra wrote:
Even,

On 03/12/14 10:24, Even Rouault wrote:
> Homme,
>
>>
>> I've come up against a problem with `gdalbuildvrt` taking a long time to
>> create
>> a VRT when it is passed a large number of source datasets. I am trying
>> to create
>> a VRT file for a zoom level in a TMS structure containing JPEG tiles. The
>> command I'm using is:
>>
>> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
>>
>> where the number of tiles is:
>>
>> $ find ./tiles/18 -iname *.jpg | wc -l
>> 767104
>>
>> The processing seemed to progress reasonably quickly with the progress bar
>> outputing '0... etc ...100 - done'.  However `gdalbuildvrt` continued
>> running
>> until I killed it 8 hours later.  Looking at `output.vrt` just before I
>> killed
>> the program showed it remained empty (0 bytes).
>
> I've looked up a bit at the code, and I spotted a potential performance
> problem when serialing the in-memory VRT into the XML with a big number of > sources. I've just committed an improvement into trunk that will make the
> complexity of source serialization linear instead of quadratic.

Many thanks! I will give it a spin and report back...

>
>>
>> Before digging any deeper is there something I'm missing? Am I expecting >> too much of `gdalbuildvrt`, or indeed the VRT format, in processing this
>> many source
>> datasets?
>>
>> Conceptually in this instance it seems as if it would be useful for a
>> VRT file
>> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or something
>> similar.  I'm not sure how efficiently source datasets are indexed in
>> VRTs and
>> whether this might be contributing to the problem?
>
> There's no indexing in VRT. So yes for that big number of sources, there might > be performance problems since each RasterIO() request will have to go test if > each source interstects the requested area of interest. Adding an in-memory > spatial index after opening the VRT would likely be possible, provided that > the non neglectable size of the VRT/XML doesn't make opening it too slow. That
> depends on the use cases.
>
> Yes, perhaps referencing a shapefile tile index could be a possible
> enhancement.

Ok, that's useful to know, thanks. Unless I hear back otherwise, I'll submit an
enhancement request on the issue tracker to bookmark the issue.

Best regards,

Homme

>
>
> Even
>



_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to