How can I “open” a handle to a pre-existing memory dataset?  That sounds like 
it may work for me.

As a matter of semantics, my sense of what GetNextFeature() would return would 
be a local view of the database on a per thread basis.  Each thread would have 
its own cursor into the database, said another way.

Best,
Jesse

Lead Computer Scientist
Science Systems and Applications, Inc.
Dr Compton Tucker Team
NASA Goddard Space Flight Center

From: Even Rouault <even.roua...@spatialys.com>
Date: Monday, October 28, 2024 at 12:08 PM
To: Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] 
<jesse.r.me...@nasa.gov>, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev <gdal-dev@lists.osgeo.org>
Subject: [EXTERNAL] Re: [gdal-dev] gdal.Rasterize with same OGR dataset from 
two python threads
CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.




Le 28/10/2024 à 17:01, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev a écrit :
I have two calls to gdal.Rasterize, each of which target a separate GDAL memory 
dataset but source the same OGR memory dataset, that I hoped could be ran in 
parallel using Python’s concurrent futures.  The idea being that each GDAL call 
unlocks the Python GIL, and performing read only operations on the vector 
database (except for storing memory for the results) could in principle be a 
safe and effective optimization, as the feature layers themselves are not 
mutated.  The SQL dialect is SQLite, so presumably the OGR dataset has to be 
converted to a SQLite (memory) database.  Technically SQLite supports multiple 
readers just fine, but this doesn’t mean GDAL/OGR does.  The multithreading 
documentation page doesn’t explicitly mention OGR / vector datasets but I 
presume they inherit similar stateful restrictions (Yes RFC 101 is coming).  
However, running these SQL queries at the same times causes OGR to trip over 
itself (I presume OGR assumes only one query statement is being evaluated at 
the same time).

So I think the intended work around is either: accept this is as a serially 
dependent task, or copy the dataset and have each Rasterize() work on a copy, 
yes?
I'm not clear if you use the same Python source vector dataset, or if you open 
your source dataset once for each thread ?  The first case is a big no no: 
anything could happen, including wrong results and crashes. One object per 
thread is the way to go. If the processing is very intensive on acquiring 
source features, you may hit a global lock at the SQLite level, but there isn't 
much we can do about that. Or you need to use multi-processing parallelization 
instead of multi-threading. But you certainly don't need to copy your source 
dataset.


In the same spirit as RFC 101, which gives some thread safety to raster 
read-only workloads, is there interest in expanding this to vector datasets?

That would be tricky. What would be the expect result if a user would use 
GetNextFeature() on a thread-safe OGRLayer...: would users expect each thread 
to see all features or features would be distributed among calling threads ?
Even

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.

Butcher of all kinds of standards, open or closed formats. At the end, this is 
just about bytes.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
  • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
    • ... Even Rouault via gdal-dev
      • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
        • ... Even Rouault via gdal-dev
          • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev

Reply via email to