Well, I believe I've solved the problem. Hieu, I believe that the fix you describe is exactly what my friend did to fix his problem.
To fix mine, I started by recompiling, turning off threading, since I wasn't using it. That got me past the initial problem, but moses still died sometimes. Following Ken's advice, I tracked down the error log, and when it died it complained about not being able to find GLIBCXX_3.4.9 or GLIBCXX_3.4.10 or GLIBCXX_3.4.11 in /usr/lib64/libstdc++.so.6. I ssh'd in to the machine where that job had run and died, and ran the following to check: strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX Sure enough, the latest version provided there was GLIBCX_3.4.8. But, when I ran the same command on the machine where I'd compiled Moses, the later versions also were available. That tipped me off that something was wrong. I checked the OS versions, and it turns out that there are a handful of machines on our local grid that are still CentOS 5.5. I had compiled under Scientific Linux 6, and most of the machines on the grid have been upgraded to SL6. So it appears that the problem was that I had compiled moses against a newer version of glibc++, and when I ran on the older distro that had an older glibc++, it would die because it expected the newer version to be there and it wasn't. So for the moment my solution is to restrict my moses jobs to only run on the machines with the newer distro installed. Also, for anyone looking at this later, FWIW I'm running an older version of moses, and not the current master. Not that that probably matters in this case. Cheers, Lane On Tue, Jan 10, 2012 at 7:32 AM, Hieu Hoang <[email protected]> wrote: > ah, if it translates everything THEN segfault, it's likely to be a > double-delete in 1 of the destructors. > > your friend might have added this macro > EXIT_RETURN > which basically just avoids the destructors (Main.cpp line 501) > > however, it'll be good to know where it blows up and craft the destructors > properly > > > On Tue, Jan 10, 2012 at 7:17 PM, Lane Schwartz <[email protected]> wrote: >> >> No, I'm using plain text phrase tables and plain text language model >> files. >> >> On Tue, Jan 10, 2012 at 6:48 AM, Hieu Hoang <[email protected]> >> wrote: >> > hey lane >> > >> > are you using binary kenlm files that was binarized previously? >> > >> > I think they're not compatible across gcc versions, until a recent >> > change >> > ken made. Due to some kinda #pragma memory alignment thingy apparently >> > >> > On Tue, Jan 10, 2012 at 4:06 AM, Lane Schwartz <[email protected]> >> > wrote: >> >> >> >> After upgrading from CentOS 5.5 to Scientific Linux 6, I've >> >> encountered some weird behavior. >> >> >> >> When I run moses, it successfully translates all of the sentences, but >> >> then it (sometimes) segfaults. It doesn't segfault all the time, >> >> though. One of the other guys in my office says he had this problem, >> >> and figured out a simple fix for it, but unfortunately he doesn't >> >> remember what the fix was. >> >> >> >> Has anyone else seen anything like this? >> >> >> >> Thanks, >> >> Lane >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> >> >> >> -- >> When a place gets crowded enough to require ID's, social collapse is not >> far away. It is time to go elsewhere. The best thing about space travel >> is that it made it possible to go elsewhere. >> -- R.A. Heinlein, "Time Enough For Love" > > -- When a place gets crowded enough to require ID's, social collapse is not far away. It is time to go elsewhere. The best thing about space travel is that it made it possible to go elsewhere. -- R.A. Heinlein, "Time Enough For Love" _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
