Ha! Well I'm glad this was useful. Honestly I was surprised to see this bug again -- it was really obscure. Maybe as you say it only affects machines named hopper :)
From: Backeljauw Franky <[email protected]> Reply-To: "[email protected]" <[email protected]> List-Post: [email protected] Date: Tuesday, October 28, 2014 at 8:15 AM To: "[email protected]" <[email protected]> Subject: Re: [easybuild] CMake-3.3.0-intel-2014b : problem building > Hello again, > Todd, > > We have found the following alert on IBM¹s website regarding GPFS: > > * http://www.gpfsusergroup.org/news/gpfs-3-5-announcments > * http://www-01.ibm.com/support/docview.wss?uid=isg3T1021392 > >> IBM has identified a problem with GPFS 3.5.0.20 and GPFS 4.1.0.2 where GPFS >> may fail to correctly handle multiple vectors passed via the writev() system >> call. When a {NULL, 0} is passed as the first vector, an EINVAL error may be >> incorrectly returned. This would cause the user application to fail >> unexpectedly when writev() is called to write to a GPFS file. User data are >> not affected. The writev() call is most likely to have been automatically >> generated by the library or compiler. > > Guess what we are running GPFS 3.5.0.20 on our compute nodes :-( And it > would also explain why we do not have it on another (non-compute) node, since > that is still running GPFS 3.5.0.19... That¹s what they call bad luck! > > Thanks a lot for pointing is in the right direction. Now let¹s hope that we > get a GFPS fix soon! > > -- Regards, > > Franky Backeljauw > > > > Op 28-okt.-2014, om 13:46 heeft Backeljauw Franky > <[email protected]> het volgende geschreven: > >> Hello Todd, >> >> You¹re spot on! Here the file testC(XX)Compiler.c was empty indeed. We are >> building on one of our GPFS filesystems on a compute node which is running SL >> 6.4. Previously, we managed to build on another node with RHEL 6.4, so we¹ll >> be checking for differences between these two nodes. >> >> On SL 6.4, when building on /tmp, the build is fine, but the install fails >> when it¹s copying >> /tmp/easybuild/CMake/2.8.12.1/intel-2014a/cmake-2.8.12.1/Copyright.txt to its >> destination (on the GPFS filesystem). When executing the copy ourselves, it¹s >> working fine. >> >> I¹m puzzled... Do you know whether they solved the issue and how? >> >> P.s.: The name of our new cluster is Hopper as well, yet we doubt whether >> this has anything to do with it ;-) >> >> -- Many thanks, >> >> Franky >> >> >> >> Op 28-okt.-2014, om 11:08 heeft Todd Gamblin <[email protected]> het volgende >> geschreven: >> >>> I've seen this error on machines where the filesystem was having issues, >>> specifically on the home filesystem on NERSC's hopper machine. >>> >>> The problem was that try_compile was generating an empty testCCompiler.c -- >>> have you looked at the size of this file in your build output? >>> >>> The root cause of the problem was that ostream::operator<< was generating >>> calls to fwrite with more than 1023 bytes, but for whatever reason the >>> filesystem was failing to execute that call. Running strace on CMake showed >>> the buggy call -- I'm attaching the C++ and CMake reproducers I made for >>> NERSC's system. You can run the cmake one with cmake -E, and you'll need to >>> compile th C++ one. Strace them and see if you can see the problem I'm >>> describing. Or try building in a different filesystem and see if the >>> problem persists. >>> >>> You may actually have a different problem, but the output you're reporting >>> looks awfully familiar to me. I don't think it's easy build's fault. >>> >>> -Todd >>> >>> >>> >>> Here's a more detailed description of the problem that I sent to the PETSc >>> developers a few weeks ago: >>> >>> Hi guys, >>> >>> After playing around with this, the problem is the C++ implementation on >>> hopper. Something is screwy with fstream it can¹t write chunks larger >>> than 1023 bytes. Attached are two reproducers for your support request. >>> One reproduces the problem in CMake; one reproduces it in C++. >>> >>> If you dig in the CMakeFiles directory, you¹ll see that the C file used to >>> do the compiler identification is actually 0 bytes, which is what gets you >>> the undefined reference to main in your error log, causing the compiler >>> test to fail: >>> >>> $ l CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c >>> -rw-r--r-- 1 tgamblin tgamblin 0 Sep 30 21:34 >>> CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c >>> >>> >>> I figured something was wrong with the CMake after the OS upgrade Mark >>> mentioned, and I tried to build a fresh one, but that gave the same error >>> *in the bootstrap script* I couldn¹t even build cmake. >>> >>> If you dig around to the spot in CMake where it generates the files to use >>> for compiler testing, you find that CMake reads a file into a variable, >>> filters it, and writes it out. Printing the variable using message() pre- >>> and post-filtering works fine. But writing the variable still gets you a >>> 0 byte file. >>> >>> Making a simple reproducer that writes a small string succeeds. If, >>> however, you make a string > 1024 bytes, you get zero-byte output. Test >>> that by running my file: >>> >>> cmake -P test.cmake >>> >>> If you strace that, you notice this: >>> >>> $ strace cmake -P test.cmake >>> [ snip ] >>> writev(3, [{NULL, 0}, {"01234567890123456789012345678901"..., 1024}], 2) >>> = -1 EINVAL (Invalid argument) >>> >>> Smaller messages use a write() and not a writev(), so they succeed. But >>> that¹s not cmake¹s fault. ofstream does that. If you run the attached >>> C++ program, which uses ofstream to write a 1024-byte string, it fails >>> too. Take one character off the string and it works. >>> >>> So, something is botched with Hopper¹s C++ libs, or maybe with writev. I >>> imagine that more than CMake is broken, at least on the front-end nodes. >>> I don¹t know of many programs that write large chunks to ofstreams, so >>> maybe all is not lost. >>> >>> Still makes me suspicious of the hopper machine. >>> >>> -Todd >>> >>> >>> >>> From: Backeljauw Franky <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, October 28, 2014 at 2:56 AM >>> To: "[email protected]" <[email protected]> >>> Subject: [easybuild] CMake-3.3.0-intel-2014b : problem building >>> >>>> Hello all, >>>> >>>> We have great difficulty in installing CMake on RHEL 6.4. We have tried >>>> both with CMake-2.8.12.1-intel-2014a.eb and CMake-3.3.0-intel-2014b.eb >>>> which is included in EasyBuild 1.15.2. The same problem occurs with the >>>> foss-toolchains as well as with (e.g.) CMake-2.8.12.1-GCC-4.8.2.eb. >>>> >>>> We get the following errors: >>>> >>>>> -- The C compiler identification is unknown >>>>> CMake Error at Modules/CMakeDetermineCCompiler.cmake:170 (configure_file): >>>>> configure_file Problem configuring file >>>>> Call Stack (most recent call first): >>>>> CMakeLists.txt:16 (project) >>>>> >>>>> >>>>> -- The CXX compiler identification is unknown >>>>> CMake Error at Modules/CMakeDetermineCXXCompiler.cmake:168 >>>>> (configure_file): >>>>> configure_file Problem configuring file >>>>> Call Stack (most recent call first): >>>>> CMakeLists.txt:16 (project) >>>>> >>>>> >>>>> -- Check for working C compiler: >>>>> /apps/antwerpen/ivybridge/sl6/icc/2013.5.192-GCC-4.8.3/bin/intel64/icc >>>>> CMake Error at Modules/CMakeTestCCompiler.cmake:47 (try_compile): >>>>> Unknown extension ".c" for file >>>>> >>>>> >>>>> /apps/antwerpen/easybuild/build/CMake/3.0.0/intel-2014b/cmake-3.0.0/CMakeF >>>>> iles/CMakeTmp/testCCompiler.c >>>>> >>>>> try_compile() works only for enabled languages. Currently these are: >>>>> >>>>> C CXX >>>>> >>>> I have included the full log for the CMake-3.0.0 build (cmake.log). >>>> >>>> I hope someone can help us out here. >>>> >>>> -- Many thanks for your reply, >>>> >>>> Franky Backeljauw >>> <test.cmake><test.C> >> >

