Our emails crossed. I am indeed testing with the latest flatbuffers now too.
Agreed on the rest. On Fri, Feb 23, 2024 at 3:42 PM Even Rouault <even.roua...@spatialys.com> wrote: > Simon, > > did you try to update to the latest > https://github.com/google/flatbuffers/releases to see if that would solve > the issue ? If that worked, that would be the best way forward... > > Otherwise if the issue persists with the latest flatbuffers release, a > (admitedly rather tedious) option would be to do a git bisect on the > flatbuffers code to identify the culprit commit. With some luck, the root > cause might be obvious if a single culptrit commit can be exhibited > (perhaps some subtle C++ undefined behaviour triggered? also it is a bit > mysterious that it hits only for static builds), or otherwise raise to the > upstream flatbuffers project to ask for their expertise > > Even > Le 23/02/2024 à 23:54, Simon Eves via gdal-dev a écrit : > > I was able to create a fork of 3.7.3 with just the *flatbuffers* replaced > with the pre-3.6.x version (2.0.0). > > This seemed to only require changes to the version asserts and adding an > *align* parameter to *Table::VerifyField()* to match the newer API. > > > https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0 > > Our system works correctly and passes all GDAL I/O tests with that > version. Obviously this isn't an ideal solution, but this is otherwise a > release blocker for us. > > I would still very much like to discuss the original problem more deeply, > and hopefully come up with a better solution. > > Yours hopefully, > > Simon > > > > On Thu, Feb 22, 2024 at 10:22 PM Simon Eves <simon.e...@heavy.ai> wrote: > >> Thank you, Robert, for the RR tip. I shall try it. >> >> I have new findings to report, however. >> >> First of all, I confirmed that a build against GDAL 3.4.1 (the version we >> were on before) still works. I also confirmed that builds against 3.7.3 and >> 3.8.4 still failed even with no additional library dependencies (just >> sqlite3 and proj), in case it was a side-effect of us also adding more of >> those. I then tried 3.5.3, with the CMake build (same config as we use for >> 3.7.3) and that worked. I then tried 3.6.4 (again, same CMake config) and >> that failed. These were all from bundles. >> >> I then started delving through the GDAL repo itself. I found the common >> root commit of 3.5.3 and 3.6.4, and all the commits in the >> *ogr/ogrsf_frmts/flatgeobuf* sub-project between that one and the final >> of each. For 3.5.3, this was only two. I built and tested both, and they >> were fine. I then tried the very first one that was new in the 3.6.4 chain >> (not in the history of 3.5.3), which was actually a bulk update to the >> *flatbuffers* sub-library, committed by Bjorn Harrtell on May 8 2022 >> (SHA f7d8876). That one had the issue. I then tried the >> immediately-preceding commit (an unrelated docs change) and that one was >> fine. >> >> My current hypothesis, therefore, is that the *flatbuffers* update >> introduced the issue, or at least, the susceptibility of the issue. >> >> I still cannot explain why it only occurs in an all-static build, and >> even less able to explain why it only occurs in our full system and not >> with the simple test program against the very same static lib build that >> does the very same sequence of GDAL API calls, but I repeated the build >> tests of the commits either side and a few other random ones a bit further >> away in each direction, and the results were consistent. Again, it happens >> with both GCC 11 and Clang 14 builds, Debug or Release. >> >> I will continue tomorrow to look at the actual changes to *flatbuffers* in >> that update, although they are quite significant. Certainly, the >> *vector_downward* class, which is directly involved, was a new file in >> that update (although on inspection of that file's history in the >> *google/flatbuffers* repo, it seems it was just split out of another >> header). >> >> Bjorn, I don't mean to call you out directly, but I am CC'ing you to >> ensure you see this, as you appear to be a significant contributor to the >> *flatbuffers* project itself. Any insight you may have would be very >> welcome. I am of course happy to describe my debugging findings in more >> detail, privately if you wish, rather than spamming the list. >> >> Simon >> >> >> >> >> >> >> On Tue, Feb 20, 2024 at 1:49 PM Robert Coup <robert.c...@koordinates.com> >> wrote: >> >>> Hi, >>> >>> On Tue, 20 Feb 2024 at 21:44, Robert Coup <robert.c...@koordinates.com> >>> wrote: >>> >>>> Hi Simon, >>>> >>>> On Tue, 20 Feb 2024 at 21:11, Simon Eves <simon.e...@heavy.ai> wrote: >>>> >>>>> Here's the stack trace for the original assert. Something is stepping >>>>> on scratch_ to make it 0x1000000000 instead of null, which it starts out >>>>> as >>>>> when the flatbuffer object is created, but by the time it gets to >>>>> allocating memory, it's broken. >>>>> >>>> >>>> What happens if you set a watchpoint in gdb when the flatbuffer is >>>> created? >>>> >>>> watch -l myfb->scratch >>>> or watch *0x1234c0ffee >>>> >>> >>> Or I've also had success with Mozilla's rr: https://rr-project.org/ — >>> you can run to a point where scratch is wrong, set a watchpoint on it, and >>> then run the program backwards to find out what touched it. >>> >>> Rob :) >>> >> > _______________________________________________ > gdal-dev mailing > listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- http://www.spatialys.com > My software is free, but my time generally not. > >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev