On Fri, 18 May 2012, Kirk, Benjamin (JSC-EG311) wrote:

> Damn - this sounds subtle, and good catch. I'll have a look but it
> may be mid week before I can do anything useful.

So here's the bug:

This appears to be the *same* libHilbert bug that was hitting us a
couple years ago.  As best as I can tell, the explanation here is that
I'm utterly incompetent.

It's been a few years, but my best guess at what happened years ago
was that I tested upgrading libHilbert to 64-bit FixBitVec types, it
corrected the problem, I reenabled libHilbert by default and committed
configure* to trunk, and I *forgot* to commit the 32-bit->64-bit
change to trunk.  So my tests suggested things were fixed, this bug
crops up so rarely that nobody else noticed it, and it just got left
unfixed.


So do we just reenable the change now?  Well, I'm feeling more
paranoid and did more testing, and here's the problems:

If we continue using double in our get_hilbert_coords() helper
function, but we switch to 64-bit Hilbert indices, then floating point
arithmetic is simply too fragile to reliably give us correct results.
I'll upload some test cases if people are curious about the details,
but basically I can get different coordinates (and in some cases
grossly wrong coordinates or more duplicated coordinates) by switching
compilers.  Unless we want to risk solution output being corrupted by
a bad compiler or solution input being corrupted by a changed
compiler, this isn't an option.

Long double seems to give reliable results, but that's on a platform
with 80-bit long doubles - the C/C++ standards allow "long double" ==
"double", and IIRC on many platforms that's all you get.  Likewise
128-bit FP formats are pretty non-portable, so if we want reliable
64-bit Hilbert indices we're going to need to use MPFR or some other
extended-precision software implementation.

But here's the real kicker: even with long double, changing from
32-bit to 64-bit libHilbert renders old solution files *incompatible*
with new ones.  In theory we ought to just be getting a finer-grained
approximation to the same space-filling curve; in practice I'm seeing
regression test failures.


So I see three solutions here.

(A) we augment the 32-bit libHilbert sorting with some subsorting that
distinguishes duplicate indices, via something a simple as a tacked-on
lexicographical order sort.  This would keep compatibility with old
files and require no API changes.

(B) we switch to 64-bit libHilbert bitvectors plus MPFR floating point
in get_hilbert_coords().  This is a bit more efficient and a lot less
ugly than (A), but it probably breaks all our current solution files.
It would also fail for new solutions in the case of a mesh that's
ridiculously h-refined (e.g. 64 levels) at the origin point.  That's
something we don't even support now, and it's (literally) a corner
case, but I've seen such a mesh used with another code for an EM
problem with a ridiculously strong singularity.

(C) we stop trying to enforce *any* ordering on our solution files, and
we just assume that the solution file and corresponding mesh file were
ordered consistently.  This might be trickier to implement (Do we just
hold off on *any* initial renumbering of any mesh?  We'll need to fix
some MeshCommunications stuff to handle a ParallelMesh with
discontiguous DofObjects) but I think it's the right way to go in the
long run.  We'd want to keep writing space-filling-curve sorted files
(using either method (A) or (B)) by default, to improve the initial
partitioning in N-to-M restarts, but this way we wouldn't have to do
any sorting at input time and we'd have total compatibility with old
libMesh files, regardless of whether libHilbert was 32-bit, 64-bit, or
completely disabled.

I'm obviously leaning toward implementing (A) right away, then adding
(C) if I ever find the time, but other opinions would be welcome.
---
Roy

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to