Hi Even, Thanks for such a quick fix! I'm gonna apply the patch and recompile GDAL and will let you know :)
Keep in touch. Best Regards, Javier Calzado -----Original Message----- From: Even Rouault [mailto:[email protected]] Sent: 26 September, 2016 17:53 To: [email protected] Cc: Francisco Javier Calzado <[email protected]>; Andrew Bell <[email protected]> Subject: Re: [gdal-dev] Multithread deadlock Hi, I admire Andrew's enthousiasm and would happily let him tackle the next bug reported in this area ;-) I could reproduce the deadlock with the same stack trace and have pushed a fix per https://trac.osgeo.org/gdal/ticket/6661. This was actually not a typical deadlock situation, but a undefined behaviour caused by trying to acquire a recursive mutex than was previously released more times than it had been acquired. Without this patch, a "workaround" would be to define the GDAL_ENABLE_READ_WRITE_MUTEX config option to NO to disable the per-dataset mutex. had added this option since I wasn't really sure that the per-dataset mutex wouldn't introduce deadlock situations. But when defining it, you'll get undefined behaviour (=potentially crashing or causing corruptions) due to 2 threads potentially calling the IWriteBlock() method of the same dataset,which was the GDAL 1.X behaviour. Clearly multi-threading scenarios involving writing is the point where the global block cache mechanism + the band-aid of the per-dataset R/W mutex are showing their limit in terms of design&maintenance complexity, and scalability. A per-dataset block cache would avoid such headaches (the drawback would be to define a per-dataset block cache size) Even > Sure Andrew, > > Here it is the call stack from Visual Studio for both threads (I just > copied the top calls where GDAL is involved, just for easy reading. If > you need the whole stack just let me know): > > THREAD 1: > > ntdll.dll!_NtWaitForSingleObject@12‑() Unknown > ntdll.dll!_RtlpWaitOnCriticalSection@8‑() > Unknown ntdll.dll!_RtlEnterCriticalSection@4‑() Unknown > gdal201.dll!CPLAcquireMutex(_CPLMutex * hMutexIn, double > dfWaitInSeconds) Line 806 C++ > gdal201.dll!GDALDataset::EnterReadWrite(GDALRWFlag eRWFlag) Line 6102 > C++ gdal201.dll!GDALRasterBand::EnterReadWrite(GDALRWFlag eRWFlag) Line > 5290 C++ gdal201.dll!GDALRasterBlock::Write() Line 742 C++ > gdal201.dll!GDALRasterBlock::Internalize() Line 917 C++ > gdal201.dll!GDALRasterBand::GetLockedBlockRef(int nXBlockOff, int > nYBlockOff, int bJustInitialize) Line 1126 C++ > > > Test.exe!RasterBandPixelAccess::SetValueAtPixel<short>(const > > int & pX, const int & pY, const short & value) Line 180 > > C++ > > THREAD 2: > > ntdll.dll!_NtWaitForSingleObject@12‑() Unknown > KernelBase.dll!_WaitForSingleObjectEx@12‑() Unknown > kernel32.dll!_WaitForSingleObjectExImplementation@12‑() > Unknown kernel32.dll!_WaitForSingleObject@8‑() Unknown > gdal201.dll!CPLCondWait(_CPLCond * hCond, _CPLMutex * > hClientMutex) Line 937 C++ > gdal201.dll!GDALAbstractBandBlockCache::WaitKeepAliveCounter() Line 134 > C++ gdal201.dll!GDALArrayBandBlockCache::FlushCache() Line 312 C++ > gdal201.dll!GDALRasterBand::FlushCache() Line 865 C++ > gdal201.dll!GDALDataset::FlushCache() Line 386 C++ > gdal201.dll!GDALPamDataset::FlushCache() Line 159 C++ > gdal201.dll!GTiffDataset::Finalize() Line 6180 C++ > gdal201.dll!GTiffDataset::~GTiffDataset() Line 6135 > C++ gdal201.dll!GTiffDataset::`scalar deleting destructor'(unsigned int) > C++ gdal201.dll!GDALClose(void * hDS) Line 2998 C++ > > > Test.exe!main::__l2::<lambda>(std::basic_string<char,std::cha > > r_traits<char>,std::allocator<char> > sourcefilePath, > > std::basic_string<char,std::char_traits<char>,std::allocator > > <char> > targetFilePath, int threadID) Line 66 C++ > > From: Andrew Bell [mailto:[email protected]] > Sent: 26 September, 2016 16:06 > To: Francisco Javier Calzado <[email protected]> > Cc: [email protected] > Subject: Re: [gdal-dev] Multithread deadlock > > Deadlocks are usually easy to debug if you can get a traceback when > deadlocked. If you can attach with gdb (or run in the debugger) and > reproduce and post the stack at the time ('where' from gdb), it should > be no problem to fix. Trying to reproduce on different hardware can > be difficult. > > On Mon, Sep 26, 2016 at 9:33 AM, Francisco Javier Calzado > <[email protected]<mailto:francisco.javier.calzado > @eri > csson.com>> wrote: Hi guys, > > I am experiencing a deadlock with just 2 threads in a single reader & > multiple writer scenario. This is, threads read from the same input > file (using different handlers) and then write different output files. > Deadlock comes when the block cache gets filled. The situation is the > following: > > > - T1 and T2 read datasets D1 and D2, both pointing to the same > input raster (GTiff). > > - Block cache gets filled. > > - T1 tries to lock one block in the cache to write data. But cache > is full, so it tries to free dirty blocks from T2 (as seen in > Internalize() method). For that purpose, it requires apparently a > mutex from D2. > > - However T2 is in a state where must wait for thread T1 to finish > working with T2’s blocks. In this state, T2 has a mutex acquired from D2. > > At least, that is what it seems to be happening based on source code. > Maybe I’m wrong, I don’t have a full picture overview about how GDAL > is internally working. The thing is that I can reproduce this issue > with the following test code and dataset: > https://drive.google.com/file/d/0B-OCl1FjBi0YSkU3RUozZjc5SnM/view?usp= > shar > ing > > Oddly enough, ticket with number #6163 is supposed to fix this, but > its failing in my case. I am working with GDAL 2.1.0 version under > VS2015 (x32, Debug) compilation. > > Even, what do you think? > > Thanks! > Javier C. > > > _______________________________________________ > gdal-dev mailing list > [email protected]<mailto:[email protected]> > http://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > -- > Andrew Bell > [email protected]<mailto:[email protected]> -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
