On Thu, May 15, 2025, 6:06 PM Eric Blake <ebl...@redhat.com> wrote:
> On Thu, May 15, 2025 at 07:20:48AM -0500, Eric Blake wrote: > > On Thu, May 15, 2025 at 01:26:16AM -0400, Nikolaos Chatzikonstantinou > wrote: > > > > $ python > > > > Python 3.13.3 (main, Apr 22 2025, 00:00:00) [GCC 15.0.1 20250418 > (Red Hat 15.0.1-0)] on linux > > > > Type "help", "copyright", "credits" or "license" for more > information. > > > > >>> import pygnuregex > > > > >>> pygnuregex.compile(b"a") > > > > Segmentation fault (core dumped) > > > > (.venv) > > > > > > Someone else on a forum helped me debug this. He noticed that the > > > Python pointer was different from the one received by the underlying C > > > function (by printing the Python pointer and inspecting the C pointer > > > with gdb). In fact it was truncated to 32 bits. We eventually > > > brainstormed that it was because I had neglected to add the argument > > > types for the C functions. I'm not sure why there is a difference in > > > Python 3.10 (where I verified the crash in a VM) versus Python 3.11. > > > > I was testing with Python 3.13, not 3.10, but I can confirm that your > > latest fix finally solves it. > > I can't help but wonder: Do you NEED to call into the C functions, or > would it be possible to write pure python code that translates any m4 > (emacs-style) regex into a similar Python regex? For example, most > characters translate straight over, ^ and $ in anchor positions > translate to \A and \Z (better \z, but that is only available in > Python 3.13 and newer), \( \| \) in m4 (outside of []) translate to ( > | ) in Python while bare ( | ) in m4 translate to \( \| \), and so > forth. A quick google search found > https://www.regexbuddy.com/convert.html as a non-free resource; but > there may be other sites that can summarize how to translate between > flavors without needing foreign function interfacing. > At the time it seemed to be the simplest solution plus it contributes a package to the python ecosystem. I will add your suggestion in the features page for now, but it'll be one more thing to worry about edge cases for, and that's why I didn't elect to do it that way. I'll take a closer look soon. Regards, Nikolaos Chatzikonstantinou >