I seriously feel like I'm going crazy here.
I commented out that "if" statement in TreeNode::find_element() but it
still isn't helping. What's happening is that I have a particular setup
that is _unstable_... i.e. I can run the same code with the same inputs and
every once in a while it fails. Also: This only happens in parallel (2
MPI). By every once in a while... I mean I have to run the code 4-5
_hundred_ times before it fails...
Here's the deal: PointLocator (retrieved from the Mesh object) is NOT
acting the same way every time. I put a print statement inside the "if
(!allowed_subdomains..." statement in TreeNode::find_element(). I also
printed out the point I'm searching for and whether or not the PL found the
point (and what element it found).
Here's what it looks like when it works:
Point: (x,y,z)=( 0.3, 0.5, 0)
searching: 0
searching: 1
searching: 2
searching: 3
searching: 4
searching: 5
searching: 6
searching: 7
searching: 8
searching: 9
searching: 10
searching: 11
searching: 12
searching: 13
searching: 14
searching: 15
searching: 16
searching: 17
searching: 18
searching: 19
searching: 20
searching: 21
searching: 22
searching: 23
searching: 24
searching: 25
searching: 26
searching: 27
searching: 28
searching: 29
searching: 30
searching: 31
searching: 32
searching: 33
searching: 34
searching: 35
searching: 36
searching: 37
searching: 38
searching: 39
searching: 40
searching: 41
searching: 42
found elem: 42
and when it fails it looks like this:
Point: (x,y,z)=( 0.3, 0.5, 0)
searching: 2
searching: 50
searching: 51
searching: 52
0 didn't find it!
As you can see - it definitely took a different (weird) path down the
"bins"... and then for some reason just didn't search anything else.
The PL is not completely broken though because it is able to find other
points in the same run like so:
Point: (x,y,z)=( 0.8, 0.5, 0)
searching: 2
searching: 50
searching: 51
searching: 52
searching: 53
searching: 54
searching: 55
searching: 56
searching: 9
searching: 10
searching: 1
searching: 4
searching: 21
searching: 22
searching: 23
searching: 24
searching: 14
searching: 15
searching: 16
searching: 3
searching: 17
searching: 18
searching: 19
searching: 20
searching: 5
searching: 6
searching: 7
searching: 8
searching: 33
searching: 34
searching: 35
searching: 36
searching: 25
searching: 26
searching: 27
searching: 28
searching: 29
searching: 30
searching: 31
searching: 32
searching: 37
searching: 38
searching: 39
searching: 40
searching: 41
searching: 42
searching: 43
searching: 44
found elem! 47
Anyone can run this same problem pretty easily (if you have MOOSE just
update it and rebuild moose_test)
You can see a failed run here:
https://www.moosebuild.org/view_result/10982
You can run the test over and over in a loop like I am like so:
while ./run_tests --re=line_value_sampler.test -p 2 ; do :; done
That will stop once it fails.
I've been able to get it to fail on both OSX and Linux with both GCC and
Clang... so there is a real issue here... but I seriously can't figure it
out...
(Also: Valgrind doesn't show anything)
Any ideas?
Derek
On Fri Jan 09 2015 at 3:26:37 PM Derek Gaston <fried...@gmail.com> wrote:
> I'm investigating an issue with PointLocators... and so I was digging
> through the logic in TreeNode... and came across some weirdness.
>
> There appears to be some logic mismatches between TreeNode::find_element()
> and TreeNode::find_element_in_children()
>
> I can see what the _intention_ was - but I don't think the code actually
> does what the comments say it is doing.
>
> The problem is that find_element() duplicates some of the checks already
> being done in find_element_in_children()... specifically the bounding box
> checks. Because of this... even though the comments in
> find_element_in_children() ultimately say that every element in the mesh is
> searched... that is NOT true!
>
> Here's what's actually happening in find_elements_in_children():
>
> 1. Active children who's bounding box contains the point are searched.
> 2. If that fails then all active children are _tried_
> - HOWEVER: The bounding box check is _repeated_ in find_element()...
> meaning that the elements won't be searched because the bounding box check
> will fail (otherwise these children would have been searched in step 1)
> - THEREFORE: Even though find_elements_in_children() _tries_ do do an
> exhaustive search... it really just loops over all other Tree nodes and
> bails out at the bounding box search in find_element()... NEVER actually
> testing an element!
> 3. Returns NULL
>
> What this means is that if the bounding boxes have floating point
> tolerance issues (which they do)... you can end up with a situation where a
> point "slips through the cracks" between the bounding box checks and the
> Elem::contains_point() checks...
>
> My proposal to fix this? Well... I think the logic could be changed quite
> a bit in find_elements_in_children()... it is trying to do too much (it
> should just recurse instead of doing checks... but I understand that's it's
> trying to speed things up by recursing favorably into children who's
> bounding boxes contain the point first)
>
> BUT - a simple fix that doesn't change too much logic is simply to remove
> the "if (this->bounds_point(p) || this->contains_ifems)" line from
> find_element(). If you made it into find_element() and that node is active
> then it means we really _do_ want to search the elements in that node...
> regardless of what the bounding box says!
>
> Finally: I think that something that could speed things up and make things
> more robust is to use a "fuzzy" bounding box. There's no reason why they
> have to be so rigid. We should use floating point fuzzy comparisons to see
> if we lie in the bounding boxes. The bounding boxes are simply there to
> speed up getting down to a set of elements that's in the right area... and
> having floating point tolerance issues keep us from traversing down into
> the right set of elements doesn't make sense...
>
> Let me know if I've missed something...
>
> Derek
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel