Ha!  Well I'm glad this was useful.  Honestly I was surprised to see this
bug again -- it was really obscure.  Maybe as you say it only affects
machines named hopper :)

From:  Backeljauw Franky <[email protected]>
Reply-To:  "[email protected]" <[email protected]>
List-Post: [email protected]
Date:  Tuesday, October 28, 2014 at 8:15 AM
To:  "[email protected]" <[email protected]>
Subject:  Re: [easybuild] CMake-3.3.0-intel-2014b : problem building

> Hello again,
> Todd,
> 
> We have found the following alert on IBM¹s website regarding GPFS:
> 
> * http://www.gpfsusergroup.org/news/gpfs-3-5-announcments
> * http://www-01.ibm.com/support/docview.wss?uid=isg3T1021392
> 
>> IBM has identified a problem with GPFS 3.5.0.20 and GPFS 4.1.0.2 where GPFS
>> may fail to correctly handle multiple vectors passed via the writev() system
>> call. When a {NULL, 0} is passed as the first vector, an EINVAL error may be
>> incorrectly returned. This would cause the user application to fail
>> unexpectedly when writev() is called to write to a GPFS file. User data are
>> not affected. The writev() call is most likely to have been automatically
>> generated by the library or compiler.
> 
> Guess what ‹ we are running GPFS 3.5.0.20 on our compute nodes :-(  And it
> would also explain why we do not have it on another (non-compute) node, since
> that is still running GPFS 3.5.0.19... That¹s what they call bad luck!
> 
> Thanks a lot for pointing is in the right direction. Now let¹s hope that we
> get a GFPS fix soon!
> 
> -- Regards,
> 
> Franky Backeljauw
> 
> 
> 
> Op 28-okt.-2014, om 13:46 heeft Backeljauw Franky
> <[email protected]> het volgende geschreven:
> 
>> Hello Todd, 
>> 
>> You¹re spot on! Here the file testC(XX)Compiler.c was empty indeed. We are
>> building on one of our GPFS filesystems on a compute node which is running SL
>> 6.4. Previously, we managed to build on another node with RHEL 6.4, so we¹ll
>> be checking for differences between these two nodes.
>> 
>> On SL 6.4, when building on /tmp, the build is fine, but the install fails
>> when it¹s copying
>> /tmp/easybuild/CMake/2.8.12.1/intel-2014a/cmake-2.8.12.1/Copyright.txt to its
>> destination (on the GPFS filesystem). When executing the copy ourselves, it¹s
>> working fine.
>> 
>> I¹m puzzled... Do you know whether they solved the issue and how?
>> 
>> P.s.: The name of our new cluster is Hopper as well, yet we doubt whether
>> this has anything to do with it ;-)
>> 
>> -- Many thanks,
>> 
>> Franky
>> 
>> 
>> 
>> Op 28-okt.-2014, om 11:08 heeft Todd Gamblin <[email protected]> het volgende
>> geschreven:
>> 
>>> I've seen this error on machines where the filesystem was having issues,
>>> specifically on the home filesystem on NERSC's hopper machine.
>>> 
>>> The problem was that try_compile was generating an empty testCCompiler.c --
>>> have you looked at the size of this file in your build output?
>>> 
>>> The root cause of the problem was that ostream::operator<< was generating
>>> calls to fwrite with more than 1023 bytes, but for whatever reason the
>>> filesystem was failing to execute that call.  Running strace on CMake showed
>>> the buggy call -- I'm attaching the C++ and CMake reproducers I made for
>>> NERSC's system.  You can run the cmake one with cmake -E, and you'll need to
>>> compile th C++ one.  Strace them and see if you can see the problem I'm
>>> describing.  Or try building in a different filesystem and see if the
>>> problem persists.
>>> 
>>> You may actually have a different problem, but the output you're reporting
>>> looks awfully familiar to me.  I don't think it's easy build's fault.
>>> 
>>> -Todd
>>> 
>>> 
>>> 
>>> Here's a more detailed description of the problem that I sent to the PETSc
>>> developers a few weeks ago:
>>> 
>>> Hi guys,
>>> 
>>> After playing around with this, the problem is the C++ implementation on
>>> hopper. Something is screwy with fstream ‹ it can¹t write chunks larger
>>> than 1023 bytes.  Attached are two reproducers for your support request.
>>> One reproduces the problem in CMake; one reproduces it in C++.
>>> 
>>> If you dig in the CMakeFiles directory, you¹ll see that the C file used to
>>> do the compiler identification is actually 0 bytes, which is what gets you
>>> the undefined reference to main in your error log, causing the compiler
>>> test to fail:
>>> 
>>> $ l CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c
>>> -rw-r--r-- 1 tgamblin tgamblin 0 Sep 30 21:34
>>> CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c
>>> 
>>> 
>>> I figured something was wrong with the CMake after the OS upgrade Mark
>>> mentioned, and I tried to build a fresh one, but that gave the same error
>>> *in the bootstrap script* ‹ I couldn¹t even build cmake.
>>> 
>>> If you dig around to the spot in CMake where it generates the files to use
>>> for compiler testing, you find that CMake reads a file into a variable,
>>> filters it, and writes it out.  Printing the variable using message() pre-
>>> and post-filtering works fine.  But writing the variable still gets you a
>>> 0 byte file.
>>> 
>>> Making a simple reproducer that writes a small string succeeds.  If,
>>> however, you make a string > 1024 bytes, you get zero-byte output.  Test
>>> that by running my file:
>>> 
>>> cmake -P test.cmake
>>> 
>>> If you strace that, you notice this:
>>> 
>>> $ strace cmake -P test.cmake
>>> [ Š snip Š ]
>>> writev(3, [{NULL, 0}, {"01234567890123456789012345678901"..., 1024}], 2)
>>> = -1 EINVAL (Invalid argument)
>>> 
>>> Smaller messages use a write() and not a writev(), so they succeed.  But
>>> that¹s not cmake¹s fault.  ofstream does that.  If you run the attached
>>> C++ program, which uses ofstream to write a 1024-byte string, it fails
>>> too.  Take one character off the string and it works.
>>> 
>>> So, something is botched with Hopper¹s C++ libs, or maybe with writev.  I
>>> imagine that more than CMake is broken, at least on the front-end nodes.
>>> I don¹t know of many programs that write large chunks to ofstreams, so
>>> maybe all is not lost.
>>> 
>>> Still makes me suspicious of the hopper machine.
>>> 
>>> -Todd
>>> 
>>> 
>>> 
>>> From: Backeljauw Franky <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Tuesday, October 28, 2014 at 2:56 AM
>>> To: "[email protected]" <[email protected]>
>>> Subject: [easybuild] CMake-3.3.0-intel-2014b : problem building
>>> 
>>>> Hello all, 
>>>> 
>>>> We have great difficulty in installing CMake on RHEL 6.4. We have tried
>>>> both with CMake-2.8.12.1-intel-2014a.eb and CMake-3.3.0-intel-2014b.eb
>>>> which is included in EasyBuild 1.15.2. The same problem occurs with the
>>>> foss-toolchains as well as with (e.g.) CMake-2.8.12.1-GCC-4.8.2.eb.
>>>> 
>>>> We get the following errors:
>>>> 
>>>>> -- The C compiler identification is unknown
>>>>> CMake Error at Modules/CMakeDetermineCCompiler.cmake:170 (configure_file):
>>>>>   configure_file Problem configuring file
>>>>> Call Stack (most recent call first):
>>>>>   CMakeLists.txt:16 (project)
>>>>> 
>>>>> 
>>>>> -- The CXX compiler identification is unknown
>>>>> CMake Error at Modules/CMakeDetermineCXXCompiler.cmake:168
>>>>> (configure_file):
>>>>>   configure_file Problem configuring file
>>>>> Call Stack (most recent call first):
>>>>>   CMakeLists.txt:16 (project)
>>>>> 
>>>>> 
>>>>> -- Check for working C compiler:
>>>>> /apps/antwerpen/ivybridge/sl6/icc/2013.5.192-GCC-4.8.3/bin/intel64/icc
>>>>> CMake Error at Modules/CMakeTestCCompiler.cmake:47 (try_compile):
>>>>>   Unknown extension ".c" for file
>>>>> 
>>>>>     
>>>>> /apps/antwerpen/easybuild/build/CMake/3.0.0/intel-2014b/cmake-3.0.0/CMakeF
>>>>> iles/CMakeTmp/testCCompiler.c
>>>>> 
>>>>>   try_compile() works only for enabled languages.  Currently these are:
>>>>> 
>>>>>     C CXX
>>>>> 
>>>> I have included the full log for the CMake-3.0.0 build (cmake.log).
>>>> 
>>>> I hope someone can help us out here.
>>>> 
>>>> -- Many thanks for your reply,
>>>> 
>>>> Franky Backeljauw
>>> <test.cmake><test.C>
>> 
> 


Reply via email to