Hi,
I'm seeing some weird behaviors related to virtual raster datasets opened
simultaneously from multiple processes. I hope I can explain so that this makes
sense. Here's an excerpt of my python code:
http://dpaste.com/hold/515217/
Line 8 is where I make a change to the dataset:
source_ds.SetProjection(source_ds.GetGCPProjection())
I do that so that the projection for the ground control points is available for
a later call to gdal.ReprojectImage(); it wasn't working until I started to use
SetProjection() in this way. All of this is being called from the context of a
multi-process web server, running as unprivileged user "www-data" under Ubuntu
(this is important later). My web server error log fills up with these:
ERROR 1: Failed to write .vrt file in FlushCache().
My assumption here is that because the unprivileged user can't write to the
dataset file, gdal throws off an error to complain that it can't flush the
dataset cache back to the original file. So far, this is just an annoyance, but
one that I would expect to go away when I switched from gdal.Open() to
gdal.OpenShared() with the read-only flag, like this:
gdal.OpenShared(src_path, gdal.GA_ReadOnly)
Still getting the errors.
Meanwhile, I made a switch in web servers, from an Apache-based CGI environment
to the multi-worker WSGI server Gunicorn. When I initially ran my code under
Gunicorn using my normal, privileged user account, I immediately started to see
failures from gdal.Open and gdal.OpenShared, specifically the assertion errors
on line 4 of the dpaste above. I tried to place exclusive file locks (using
fcntl.flock) around each access to a given VRT dataset, but this didn't seem to
help at all. There were frequent, unpredictable errors with opening data sets
in a multi-process environment *until* I switched from the privileged user to
the unprivileged user. Once I did that, everything began to work normally, but
I got all the old "ERROR 1" reports again.
It seems to me that gdal.OpenShared() with the read-only flag isn't doing what
it promises, and that it's trying to write back to the files, potentially
modifying them even as competing processes are accessing them. Is it possible
that the overlapping processes in my privileged user scenario are seeing
temporarily-empty VRT files? I'm also confused by the lack of a gdal.Close()
function or something similar, and by the fact that I can't seem to make a
change to a dataset in memory without gdal attempting to push that change back
to disk via FlushCache().
What's the right thing to do here? Make temporary copies of small VRT data sets
prior to each use so they can be safely written to and disposed of? Build a
wrapper class that encapsulates copying and disposal? Figure out some way to
make gdal release datasets when asked, or open them in real read-only mode?
Any advice greatly appreciated!
-mike.
----------------------------------------------------------------
michal migurski- [email protected]
415.558.1610
_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev