Sorry for the self-reply, but I thought I'd note that if I tweak pysvnget thusly, the SEGFAULTs stop:
--- pysvnget 2020-09-29 09:34:07.918002584 -0400 > +++ pysvnget.pools 2020-09-29 09:33:54.278153037 -0400 > @@ -21,17 +21,17 @@ > yield chunk > svn.core.svn_stream_close(self.stream) > > -def get_generator(repos_path, peg_revision, path): > - fs = svn.repos.fs(svn.repos.open(repos_path)) > +def get_generator(repos_path, peg_revision, path, pool): > + fs = svn.repos.fs(svn.repos.open(repos_path, pool)) > peg_revision = peg_revision or svn.fs.youngest_rev(fs) > fsroot = svn.fs.revision_root(fs, peg_revision) > return SvnContentProxy(fsroot, path).get_generator() > # > # > -------------------------------------------------------------------------- > - > +pool = svn.core.svn_pool_create() > if len(sys.argv) < 3: > sys.stderr.write("Usage: REPOS-PATH PATH-IN-REPOS [PEGREV]\n") > sys.exit(1) > peg_revision = len(sys.argv) > 3 and int(sys.argv[3]) or None > -generator = get_generator(sys.argv[1], peg_revision, sys.argv[2]) > +generator = get_generator(sys.argv[1], peg_revision, sys.argv[2], pool) > print(b''.join(generator).decode('utf-8')) On Tue, Sep 29, 2020 at 9:26 AM C. Michael Pilato <cmpil...@red-bean.com> wrote: > Hey, all. I'm wondering if I can get some extra eyes/brains on a > particular usage of our Python bindings. > > The attached tarball contains a directory in which lives two files: > > - run-test.sh - a shell script to drive the reproduction recipe > - pysvnget - a Python program that uses the bindings and a > generator-based wrapper of the FS's file content access APIs > > If you explode the tarball, cd into the resulting directory, and run the > shell script, it should create a test repository and working copy within > that sandbox and start a loop. The loop will... > > 1. add text (a datestamp) to a single file in the working copy, > 2. ensure the file is under version control, > 3. commit the file, then > 4. try to dump the content of the file from the repository using the > Python program. > > The problem that I see when I do this is that after a few iterations of > the loop, the Python program starts to SEGFAULT. > > I suspect there's some misinteraction with the APR pool subsystem at work > here -- my Python program is (intentionally) taking advantage of the > bindings' pool self-management logic. If I had to guess, I'd say that the > delayed access to the FS via the generator is causing reads from memory > that once lived in pools that have since been destroyed. Unfortunately, I > don't think I ever really understood how that magic worked in the first > place. > > While this is a simple scenario where "don't do that" might seem an easy > enough response, what is represented by the Python program is > much-distilled logic that is live in production in some of The Company > Formerly Known As CollabNet's products. The generator approach exists to > keep server-side memory use constant while allowing http-based reads of > arbitrarily large versioned files. Moreover, the size and nature of the > codebase is such that I'd really prefer NOT to start manually doing pool > management (though as a last resort, it's not out of the cards). > > Anything stand out as obviously wrong with my code? > > -- Mike >