I want to start off by saying a big thanks to Blake for taking his time to tackle what can only be a very difficult problem.

From what I can observe, the current discussion seems to be around the boundary of who should be responsible for ensuring thread safety around the block cache. The core of GDAL versus the individual drivers. While I see why such a conversation is important, as far as I am concerned, the most important part should be how it affects users of GDAL at the interface level. That is, if an application that is threaded is trying to use GDAL, how does it ensure thread-safety? What you have to keep in mind is that having some parts of the library not thread-safe basically just pushes the mutexing/locking to the calling applications.

Also, while it is important to document thread-safety limitations, might I suggest adding thread-safe related capabilities (TestCapability), especially if all drivers do not end-up having the same thread-safety constraints.

I personally do not see a GDALDatasetThreadSafe wrapper as adding much complexity. For instance, if you were to add a capability that indicates if a driver is inherently thread-safe, you could add a new open method to open a dataset in a thread-safe way with something like the following pseudocode:

DataSet OpenThreadSafe(GDALOpenInfo openInfo)
{
    DataSet dataSet = Open(openInfo);

    if (!dataSet.TestCapability(THREAD_SAFE))
    {
         dataSet = wrapWithThreadSafeWrapper(dataSet);
    }

    return dataSet;
}

Kind Regards,
André

On 2014-08-22 17:12, Even Rouault wrote:
Le vendredi 22 août 2014 21:50:56, Blake Thompson a écrit :
Even,

On Fri, Aug 22, 2014 at 1:33 PM, Even Rouault <[email protected]>

wrote:
Note: after re-reading, I realize that I misread your above sentence as
"it will work to translate multiple datasets in a parallel way with a
*per- dataset* cache").

So, even if you didn't write it, I'm afraid that people will assume that
calling CreateCopy() on the same source dataset handle would be
thread-safe (imagine that one thread translates to format F1, while the
other one to format F2), whereas in the current state of the RFC it is
not. Because CreateCopy() will call GetGeoTransform(),
GetProjectionRef() etc which are generally thread unsafe. For example
GTiff has a lazy loading approach for those 2 methods, and it is not the
only one.
If we claim thread-safety, we should likely offer full thread-safety (at

least
in reading scenarios), not partial one. Otherwise I'm afraid no one but
the few people that have taken part to that discussion or read the RFC
will know
the limits.
Perhaps the goal of the RFC should be to allow CreateCopy to be threadsafe?
I have no tried yet to venture into other parts of thread safety on the
dataset (already bit off a very large chunk).

I'm wondering if it wouldn't be worth having GDALDatasetThreadSafe and
GDALRasterBandThreadSafe classes (or whatever name is appropriate), that
would
follow the decorator pattern, i.e. they will own a thread unsafe "real"
dataset/band and override the methods to lock them. Similarly to what I
have
done in OGR with ogr/ogrsf_frmts/generic/ogrmutexeddatasource.cpp and
ogrmutexedlayer.cpp, needed for the FGDB driver (the one that depends on
the
ESRI SDK).

My idea would be to have an open flag GDAL_OF_THREADSAFE for
GDALOpenEx(). When set, GDALOpen() would do the usual job and get a (in
most cases) unsafe
dataset object. Then it would query a virtual method of the dataset to
return
a thread-safe version of it (GetThreadSafe()).
- If not defined by the driver, the base implementation of
GetThreadSafe() would return the dataset wrapped in
GDALDatasetThreadSafe
- If the implementation of the dataset is already thread-safe, it's
GetThreadSafe() would return just self
- For the in-between situations, it could for example override
GDALDatasetThreadSafe/GDALRasterBandThreadSafe base implementation to
specialize the methods that don't need locking.

This idea might not work (and I've not though how it would combine with
my previous IReadBlock_thread_safe/IReadBlock approach). I have just
written it
as it came to my mind. The main motivation is to make it easy to have
thread-
safe versions by default, without the driver having to care about that,
while
being flexible to make drivers that need finer control to do it.
This approach worries me, there is already a lot of inheritance of the
GDALDataset, its really hard to learn already.
Well, the learning phase would be only be people who want to tweak thread
safety of their drivers, i.e. not a lot of people I bet. The user wouldn't
have to pay a lot (just specifying GDAL_OF_THREADSAFE, or nothing at all if
the price of thread safety is not too big)

Adding another layer would
make it even harder to maintain in my opinion. It also provides one more
config option in an already overwhelming number of config options. I think
we might be better off if we made a long term plan on how we wanted to make
GDAL thread safe. Perhaps make a list of parts needing thread safety and
progress towards it over a period of time. As part of it maintain a webpage
that describes what is thread safe currently, so people are not confused on
what is and is not thread safe.
People are already confused... and have a hard-time really understanding the
implication of the few lines at
http://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe

I'm afraid more documentation of a more complex situation will not make it
better. It is not reasonable to have to document that "GetProjectionRef() is
thread-safe in drivers X and Y, whereas GetGeoTransform() is in Y and Z".
Because that's the situation we will have for sure if we don't adopt a global
default mechanism (unless someone spends weeks in adding locks in every
virtual method of every driver).

The external locking I suggested is not completely a new approach. I could
mention  Collections.synchronizedList() used in Java to make their List
implementations thread safe... Sorry for mentionning Java ;-)

The ideal would be to have a solution were by default we have a guarantee
thread-safety and people (GDAL driver developers) who want to tune it can do
it if they are brave enough.

If, through benchmark, we determine that the cost of the thread safe
version
is neglectable, then GDAL_OF_THREADSAFE might be useless, and GDALOpen()
would
always return the thread-safe version.
I wish there were lots of benchmarks in the test suite (if there are, I
don't know where they are),
They don't yet exist. Most code is functionnal code in Python, so not
appropriate for performance testing, especially regarding threading.
The autotest/cpp should likely be enhanced to have such tests (the only
performance benchmark I can think of is testperfcopywords that was created to
justify the templated approach of GDALCopyWords)

To be mentionned, there's also apps/multireadtest.cpp (not compiled by
default). It just ensures that opening the same file in N dataset objects and
reading them in N threads works.

so that we can even roughly measure changes to
the internals of the library. (I realize this is something that could be
difficult to do well).

Thanks,

Blake

--
*André Vautour*, B.Sc.
Application Developer
CARIS <http://www.caris.com>
115 Waggoners Lane
Fredericton, New Brunswick
Canada    E3B 2L4
Tel: +1.506.458.8533     Fax: +1.506.459.3849
www.caris.com <http://www.caris.com>

*Connect with CARIS*
Twitter <http://www.twitter.com/CARIS_GIS> LinkedIn <http://www.linkedin.com/groups?mostPopular=&gid=3217878> FaceBook <http://www.facebook.com/pages/CARIS-The-Marine-GIS-Experts/123907500987669?v=app_4949752878> YouTube <http://www.youtube.com/user/CARISGIS>

Download your free copy of CARIS Easy View today!
www.caris.com/easyview <http://www.caris.com/easyview>

_________________________________________________________________________
This email and any files transmitted with it are confidential and intended only for the addressee(s). If you are not the intended recipient(s) please notify us by email reply. You should not use, disclose, distribute or copy this communication if received in error.

Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the company. No binding contract will result from this email until such time as a written document is signed on behalf of the company.

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to