Re: [Libmesh-devel] libMesh/DistributedMesh Work

Roy Stogner Wed, 02 May 2018 13:25:26 -0700


I hope you don't mind, but I'm going to Cc: this to libmesh-devel.
These are good questions, and my opinions do not match those of all
the other developers and it might be unfair not to let them chime in.


On Wed, 2 May 2018, Tuzun, Robert wrote:

Greetings.  I'm still in the middle of my hiatus, but there was one
question I've been meaning to ask.  I notice, in several places, the
use of vector<bool> in the code.  Way back in 2001, Scott Meyers
(Effective STL, Item 18) recommended against the use of vector<bool>
and recommended using deque<bool> or bitset instead.


"As an STL container, there are really only two things wrong with
vector<bool>.  First, it's not an STL container.  Second, it doesn't
hold bools.  Other than that, there's not much to object to." - Scott
Meyers.  I seriously love his writing.  It takes something special to
make programming language minutia so entertaining.

This is indeed often annoying, because the inability to take real
references to vector entries or to get a pointer to a underlying array
means we often can't handle vector<bool> with the same generic code as
we use to handle vector<literally_anything_else>.  I assume you first
encountered vector<bool> in our parallel_implementation.h overloads,
written to handle just that problem?

However, the recommended alternatives are even worse in many cases.
If a code is always fine using deque<bool>, if using EIGHT TIMES as
much memory never fazes the author, then Meyers' hundreds of C++
recommendations should be ignored in favor of "Item 1: Use Java
Instead" (or maybe Python or Go these days?).  Conversely, if a code
is always happy with bitset, and never needs an array that can be
resized, then what are you writing, autotranslated Fortran 77?

Personally, I'm mostly with Howard Hinnant:
https://isocpp.org/blog/2012/11/on-vectorbool

What's abominable about vector<bool> is the name, because that's
basically a lie about supporting the whole std::vector interface, but
the data structure itself is worthwhile and often useful.  And I'm
even sympathetic to the name!  Once you create a few well-chosen
overloads (like the parallel_implementation.h stuff) then the rest of
your generic code can *mostly* support vector<bool> as if it were a
real std::vector.  And when you miss a needed specialization, the
resulting breakage is a compile-time failure, so it's a hassle rather
than a disaster.

I don't know if this advice still holds, and my search on the Web
yields conflicting answers (just do a Google search on vector<bool>
and you'll see what I mean).  Is there reason for concern?


Well, despite everything I said above: yes, there is.


First, to Meyers' objections I'd also have added "Third, it requires
bit-fiddling for every access", which means if you have a vector small
enough to *not* worry about memory savings, maybe you ought to be
using vector<unsigned char> instead.  Grepping through the code for
short logically-boolean vectors, I see that we're doing the
unsigned-char-really-holding-bool trick in dof_map.C, and
mesh_communication_global_indices.C... but we're using small
vector<bool> in about a dozen places?

I'm not going to change those without benchmarking first, because
"unsigned char is faster due to less bit fiddling" vs "bool is faster
due to fitting more stuff in cache" is the sort of system-dependent
question I can't answer a priori, and in the absence of data I think
vector<bool> is more self-documenting for something that's logically
boolean.  If anyone wants to try to prove me wrong in the libMesh
context, I'd say try System::boundary_project_solution for the
benchmark.  That has a few vector<bool> inside loops over elements,
so it looks like the easiest place to expose performance differences.


Second and more seriously, in libMesh whenever find ourselves in a
situation where tactically vector<bool> is definitely the right
choice, it may mean that strategically we've made a huge mistake!

Our classic use case for vector<bool> are vectors indexed over nodes
or over elements, where we need to support millions or billions of
entries and so using 8 times as much space is clearly a bad idea.

*However*, some of that code was a reasonable design only because it
dates back to ReplicatedMesh-only days, where we couldn't get away
from having data stored for every element on every processor.  These
days, a DistributedMesh user should be able to avoid ever requiring
O(N_elements) memory on a single MPI rank, which means that
applications should never allocate such a long vector!  So the library
should never be using one, at least on a non-debug-mode
non-ReplicatedMesh-only code path.

So let's see...

Some of these uses are OK.  E.g. the MeshTools assertion code is all
debug-mode-only, where readability is much more important than
efficiency.

Some are more dubious.  E.g.  UnstructuredMesh::all_first_order() only
gets used in practice for writing serialized viz output of an error
estimate, so making it perfectly scalable would require improving
multiple other parts of that code path first, meaning it's suboptimal
but not high priority.

Some of these uses are probably outweighed by other distributed mesh
incompatibility issues.  At first glance I'd guess
MeshTools::Modification::Smooth may not even work on a distributed
mesh, but we've just never tried to use it that way.  Looks like
LaplaceMeshSmoother is distributed-mesh-aware, so maybe everyone who
needs smoothing post-distribution has just been using that?  I don't
think we have test coverage of it though so I wouldn't trust it.

But a couple of these uses are just way past due to be fixed!  We're
still relying on big serialized vectors (not just of bool; ErrorVector
is a straight-up vector<float> under the hood!) in our error
estimation and our mesh refinement flagging code.

A separate question: in looking at parallel_implementation.h, I note
several references to Communicator::get() (e.g., line 298).  Can you
tell me where this is defined?  I could not find it anywhere.  For
example, the command grep Communicator::get -r * from the main
directory did not yield the answer.


And it probably doesn't help that our online Doxygen isn't generated
from master/HEAD!

http://libmesh.github.io/doxygen/classlibMesh_1_1Parallel_1_1Communicator.html#a4b6bd36c38eb7de350fcd56510306ced

is still useful, though, even though refactoring has moved that method
to include/parallel/communicator.h, it's still the same shim accessor
method.  We often leave any definition that short in the class
declaration itself, where it doesn't need the ClassName:: prefix.
---
Roy

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Re: [Libmesh-devel] libMesh/DistributedMesh Work

Reply via email to