Bugs item #22909, was opened at 2008-11-20 19:47 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22909&group_id=494
Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Bradley Buda (bradleybuda) Assigned to: Nobody (None) Summary: LibXML::XML::XPath::Object segfault (null pointer) on x86-64 Initial Comment: This script results in a ruby "[BUG] Segmentation fault" on a 64-bit machine, but works on a 32-bit machine: ---- #!/usr/bin/ruby require 'rubygems' require 'libxml' x = LibXML::XML::Parser.string("<root />").parse x.find("/root") # if you comment out this line, script will NOT segfault x.find("/root").length # segfault occurs here ---- valgrind and gdb agree on this stack trace: #0 0x00002aaaaca47bc7 in ruby_xml_xpath_object_empty_q (self=46912524542400) at ruby_xml_xpath_object.c:174 #1 0x00002aaaaca47c59 in ruby_xml_xpath_object_length (self=46912524542400) at ruby_xml_xpath_object.c:242 #2 0x00002aaaaacff48f in ?? () from /usr/lib/libruby1.8.so.1.8 #3 0x00002aaaaacff7b8 in ?? () from /usr/lib/libruby1.8.so.1.8 #4 0x00002aaaaad055b7 in ?? () from /usr/lib/libruby1.8.so.1.8 #5 0x00002aaaaad0dbbb in ?? () from /usr/lib/libruby1.8.so.1.8 #6 0x00002aaaaad0dc05 in ruby_exec () from /usr/lib/libruby1.8.so.1.8 #7 0x00002aaaaad0dc30 in ruby_run () from /usr/lib/libruby1.8.so.1.8 #8 0x0000000000400883 in main () Unfortunately I don't know enough about the Ruby C API to understand what's going wrong here. My environment: (note that this is a Xen node on Amazon EC2) $ uname -a Linux ...compute-1.amazonaws.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 8.04.1 Release: 8.04 Codename: hardy $ ruby -v ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] $ gem list --local libxml-ruby *** LOCAL GEMS *** libxml-ruby (0.9.2) $ aptitude show libxml2-dev Package: libxml2-dev State: installed Automatically installed: yes Version: 2.6.31.dfsg-2ubuntu1.3 ... ---------------------------------------------------------------------- >Comment By: Bradley Buda (bradleybuda) Date: 2008-11-21 07:42 Message: This is an AMD box. I had a fair amount of time to play with this today. I learned that there's definitely something weird going on with compiler optimizations. I had been compiling libxml-ruby's C extensions with -O2 (this was the default, set by rbconfig.rb on my Ubuntu box). I found that changing the compile flag to -O0 for JUST ruby_xml_xpath_object.c caused the segfault to go away. Setting -O1 for ruby_xml_xpath_object.c makes the segfault reappear. I tried each of the piecemeal -f... optimization options, but none seemed to make a difference. In my debugging I also accidentally discovered a printf 'patch' that makes the bug go away, at any optimization level (see attached). The printf statement must be preventing the compiler from performing the harmful optimization. Of course this isn't a real patch, but it might provide a clue for someone who knows more about these things than I do. For now, I can work around the issue by compiling with -O0. I wish I could give a better explanation for why this is happening, but I'm getting in pretty far over my head here :-). I attached the generated assembler code at -O0 and -O1 as well just in case some compiler guru happens to stumble on this bug. ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-20 20:48 Message: FYI - is this an intel or amd box? As for those old threads, I think this it different (libxml-ruby's internal architecture is much, much different than it used to be). ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-20 20:44 Message: Hey Bradely, Sounds good. From the bug, it looks like this is what is happening: ruby_xml_xpath_object_empty_q, line 169 /* Get the c object that the ruby object (self) is wrapping */ Data_Get_Struct(self,xmlXPathObject,xpop); /* Looks like xpop is null here when it should not be, but how can that be? So verify xpop really is null */ if (xpop->type != XPATH_NODESET) return Qnil; To help you understand the code: doc.find: 1. In ruby, create a new XPathContext Object. 2. In C, call find which does the xpath result 3. Returnx XPathObject instance wrapped by Ruby object Since you do that twice, you have 2 XPathContext objects and two XPathObject results. What is weird, is the first one shouldn't matter since you don't use it at all. It may or may not be freed deepending on the GC (actually it would be really surpising for it to be freed so quickly). The various scenarios (none that plausible) I can think of: * The first XPathObject is freed, and somehow deletes its associated document (shouldn't happen of course) making the 2nd xpath object invalid * The second XPathObject is in fact the same as the first. Not sure how that could be, unless libxml is caching XPath return results (its not that I know of) * Freeing the first XPathObject somehow corrupts the second. Or, none of the above, but the only way to find out is do some digging with gdb I think. Thanks for your help. ---------------------------------------------------------------------- Comment By: Bradley Buda (bradleybuda) Date: 2008-11-20 20:35 Message: Yes, it's 100% consistent. I haven't gotten any further with LibXML, so I don't know if there are other test cases that would show similar results - I can try to put something together. Thanks for the pointer to ruby_xml_document.c - I can look at that code as a start. I know C (I'm a bit rusty) it's just the Ruby API that I don't know as well. In my random Googling I found this (old) thread and patch: http://rubyforge.org/pipermail/libxml-devel/2007-March/000288.html http://rubyforge.org/pipermail/libxml-devel/attachments/20070309/a8c53f37/attachment.obj Any guesses as to whether or not this could be in the same class of problems? I should have some time soon (maybe this weekend?) to dig deeper into the code and start to understand how the allocation and garbage collection works. I'll update the bug with whatever I figure out. Thanks for the quick reply. ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-20 20:01 Message: Hi Bradely, Boy, that's interesting. So it always happens, without fail? Do you see other things like that happening? My best guess is somehow the reference counting scheme that is used between xpath objects and documents is broken on 64 bit machines (its in ruby_xml_document.c, the top 150 lines or so). I don't have any 64-bit machines setup here, so not sure how to debug. Can you recompile code on EC2? Are you a C hacker and have time to work through this? Just trying to figure out how to proceed. Thanks for the great bug report and stack trace, very helpful. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22909&group_id=494 _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel