Re: [Gmsh] "Re: GMSH parsing is very slow"

Christophe Geuzaine Thu, 31 Jan 2019 08:01:41 -0800


> On 31 Jan 2019, at 15:48, Al Sc <al.sc.g...@gmail.com> wrote:
> 
> Dear Christoph,
> 
> to be honest, a remark deeply hidden somewhere in the documentation wouldn't 
> have helped me.
> That's because my approach typically is not to read the documentation like a 
> book (i.e. from front to back) but rather to skip forwards to the paragraphs 
> I currently need for the implementation.
> I believe most users work in a similar way.


Looking at the doc the explanation was indeed already there... deeply hidden in 
the FAQ: http://gmsh.info/dev/doc/texinfo/gmsh.html#Geometry-module-questions 
;-)

I've patched the documentation so that the information is now right there in 
the definition of each command. See e.g. 
http://gmsh.info/dev/doc/texinfo/gmsh.html#Miscellaneous-mesh-commands


> My personal suggestion would be that you "detect" in your 
> backwards-compatible gmsh parser the event where -something-like- more than 
> 1000 CAD-to-topology synchronizations (or whatever you call it) are 
> implicitly done. And if this happens, just _FAIL_. This slightly breaks your 
> compatibility, but only in corner cases where performance is too bad for 
> practical purposes anyway. A user may want to override this behavior using a 
> special command line option like "-cad-sync-limit infinity" - in case he 
> thinks otherwise.
> 

The future is the API!

Christophe


> Best regards
> A. S.
> 
> Am Do., 31. Jan. 2019 um 15:39 Uhr schrieb Christophe Geuzaine 
> <cgeuza...@uliege.be>:
> 
> 
> > On 31 Jan 2019, at 15:19, Al Sc <al.sc.g...@gmail.com> wrote:
> > 
> > Dear Christophe,
> > 
> > I tried out your gmsh-file with "Point{100+1:100+N} In Volume{1};" and it's 
> > indeed much faster!
> > That indeed seems to be the solution I needed -- thanks a lot!
> > 
> > As I don't really know much about the internal functionality of gmsh, it 
> > would have been almost impossible for me to come up with that solution on 
> > my own.
> > From your documentation it is clear that the input to gmsh is more than 
> > just a geometry specification, and it's rather kind of a script which 
> > generates the geometry.
> > However, given only the documentation, a naive user like me will conclude 
> > that you do not *need* any advanced scripting functionality to generate a 
> > basic model. E.g. that you can restrict yourself to "just using the 
> > geometry-specification subset of the language" without any performance 
> > penalty. Hence, a naive user concludes that:
> > " Point{100+1:100+N} In Volume{1};"
> > and
> > " Point{100} In Volume{1}; ...
> >  ... Point{10100} In Volume{1};"
> > are, in fact, equivalent (not only in semantics, but also in "parser 
> > performance").
> 
> They are equivalent. The difference is *when* you call them.
> 
> > 
> > I'd suggest that you try to implement the model parsing in a way that the 
> > "topology structure" is not immediately re-computed, but only invalidated.
> > Because, in my opinion, the observed performance-drop of gmsh with repeated 
> > "Point In Volume" statements is still highly counter-intuitive, even to 
> > users more advanced than me.
> > 
> 
> Indeed. This is all made clear when you use the API, where the distinction 
> between CAD operations and topological model operations is made explicit. 
> 
> For example, if using the Python API you try to do
> 
> for i in range(N):
>   gmsh.model.occ.addPoint(...)
>   gmsh.model.mesh.embed(...)
> 
> you will get an error, stating that the point does not exist in the model - 
> since no call to gmsh.model.occ.synchronize() has been made after the point 
> was added to the CAD. To make it work you would then do
> 
> for i in range(N):
>   gmsh.model.occ.addPoint(...)
>   gmsh.model.occ.synchronize(...)
>   gmsh.model.mesh.embed(...)
> 
> When N is large you would rapidly conclude that calling "synchronize" that 
> many times will kill your performance (the API doc makes this explicit).
> 
> The .geo parser strives to be automatic and backward compatible with very old 
> versions of Gmsh. So each "Point ... In ..." command thus checks if the CAD 
> has been modified - and if it has, triggers a synchronization automatically. 
> We could add a note in the documentation about this, by stating explicitly 
> which operations will trigger a sync. I'm adding this to our TODO list.
> 
> Christophe
> 
> 
> 
> 
> > Best regards,
> > A.S.
> > 
> > Am Do., 31. Jan. 2019 um 14:53 Uhr schrieb Christophe Geuzaine 
> > <cgeuza...@uliege.be>:
> > 
> > This was precisely the point of my example: if you embed the point after 
> > each point is created, you force a rebuild of the topology of the model. So 
> > the efficient script would be
> > 
> > SetFactory("OpenCASCADE");
> > Box(1) = {0,0,0, 1,1,1};
> > N=10000;
> > For i In {1:N}
> >   Point(100+i) = {0.25 + 5e-5*i, 0.1,0.1};
> > EndFor
> > Point{100+1:100+N} In Volume{1};
> > 
> > Christophe
> > 
> > 
> > > On 31 Jan 2019, at 14:48, Al Sc <al.sc.g...@gmail.com> wrote:
> > > 
> > > Dear Christophe,
> > > 
> > > my example files are scientific data, however originate from processing 
> > > certain proprietary 3D models that were shared with me under certain 
> > > restrictions. Therefore, it's difficult to share my original file with 
> > > you.
> > > However, I was indeed able to reproduce the issue using only a slight 
> > > modification of your example file!
> > > 
> > > Compare  the following two files:
> > > test1.geo:
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%%
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:1000}
> > >   Point(100+i) = {0.25 + 1e-4*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > 
> > > test2.geo:
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:10000}
> > >   Point(100+i) = {0.25 + 1e-5*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > %%% %%% %%% %%% %%% %%% %%% %%% %%% %%% 
> > > 
> > > The first file contains 1000 internal pts, and the second one contains 10 
> > > times as much.
> > > Now run:
> > >   $ time gmsh -1 test1.geo
> > >     -> Takes 0.88 sec
> > >   $ time gmsh -1 test2.geo 
> > >     -> Takes 75 sec 
> > > This is a nearly 100 (!!!) times increase in parsing run time, when the 
> > > model size is increased by only a factor of 10.
> > > This nicely illustrates what I meant by "quadratic growth of the parsing 
> > > run time as a function of the number of internal pts."
> > > As a consequence of this "quadratic growth", the run time for larger 
> > > models quickly becomes enormous.
> > > 
> > > Best regards
> > > 
> > > A.S.
> > > 
> > > Am Do., 31. Jan. 2019 um 14:31 Uhr schrieb Christophe Geuzaine 
> > > <cgeuza...@uliege.be>:
> > > 
> > > Can you send a test file?
> > > 
> > > I tried this, i.e. adding 1000 embedded points in a volume, and it is 
> > > quite fast (< 2 seconds):
> > > 
> > > SetFactory("OpenCASCADE");
> > > Box(1) = {0,0,0, 1,1,1};
> > > For i In {1:1000}
> > >   Point(100+i) = {0.25 + 5e-4*i, 0.1,0.1};
> > >   Point{100+i} In Volume{1};
> > > EndFor
> > > 
> > > Maybe you modify the model after each insertion of an embedded point, 
> > > which would force a rebuild of the topological model?
> > > 
> > > Christophe
> > > 
> > > 
> > > > On 31 Jan 2019, at 13:04, Al Sc <al.sc.g...@gmail.com> wrote:
> > > > 
> > > > Dear Sir or Madam,
> > > > 
> > > > some more details on the previously-reported case of extremely slow 
> > > > parsing of a gmsh file:
> > > > The file contains 34795 points -- of which 23041 points are "Point In 
> > > > Volume" (e.g. embedded_vertices of a GRegion?). Moreover, it contains 
> > > > 4998 "Plane Surface"s and one single "Volume". Almost all plane 
> > > > surfaces are triangles and quads -- except for <10 facets, which each 
> > > > have a very high vertex count (typ. 7000).
> > > > 
> > > > I found out that if I remove all "Point In Volume" objects, then the 
> > > > parsing takes only 20 seconds. However, if all the "Point In Volume" 
> > > > objects are not removed, parsing takes around 3000 seconds. This led me 
> > > > to the hypothesis that there is some huge inefficiency with parsing 
> > > > gmsh files with "Point In Volume" (probably quadratic run time in 
> > > > N(PointInVolume)??).
> > > > 
> > > > Furthermore, I analyzed the runs using "perf" (linux utility).
> > > > Run 1 (about 3000 seconds):
> > > >   $ time perf record ./gmsh-3.0.6-Linux64/bin/gmsh -1 input_orig.geo
> > > > Run 2 (about 20 seconds):
> > > >   $ time perf record ./gmsh-3.0.6-Linux64/bin/gmsh -1 
> > > > input_no_point_in_volume.geo
> > > > In Run 1, "perf report" yields the following top-scoring functions:
> > > >   11.57%  gmsh     gmsh                 [.] GModel::getEdgeByTag
> > > >    7.73%  gmsh     gmsh                 [.] GEO_Internals::synchronize
> > > >    6.80%  gmsh     libc-2.12.so         [.] _int_free
> > > >    6.31%  gmsh     gmsh                 [.] GModel::getVertexByTag
> > > >    4.91%  gmsh     libc-2.12.so         [.] malloc
> > > >    4.53%  gmsh     gmsh                 [.] InterpolateCurve
> > > >    4.40%  gmsh     libstdc++.so.6.0.22  [.] std::_Rb_tree_increment
> > > >    3.88%  gmsh     libc-2.12.so         [.] memcpy
> > > >    3.69%  gmsh     gmsh                 [.] List_Nbr
> > > >    3.35%  gmsh     libc-2.12.so         [.] _int_malloc
> > > >    3.24%  gmsh     gmsh                 [.] GEntity::GEntity
> > > >    3.13%  gmsh     gmsh                 [.] avl_lookup
> > > >    2.63%  gmsh     gmsh                 [.] gmshFace::resetNativePtr
> > > >    2.53%  gmsh     gmsh                 [.] GEdge::addFace
> > > >    2.52%  gmsh     gmsh                 [.] CompareVertex
> > > >    1.97%  gmsh     gmsh                 [.] std::_List_base<GEdge*, 
> > > > std::allocator<GEdge*>
> > > >    1.88%  gmsh     gmsh                 [.] GModel::getFaceByTag
> > > >    1.76%  gmsh     gmsh                 [.] Tree_Action
> > > >    1.74%  gmsh     gmsh                 [.] gmshEdge::degenerate
> > > >    1.63%  gmsh     gmsh                 [.] CompareCurve
> > > >    1.53%  gmsh     gmsh                 [.] std::_List_base<int, 
> > > > std::allocator<int> >::_M
> > > >    1.38%  gmsh     gmsh                 [.] GModel::deletePhysicalGroups
> > > >    1.23%  gmsh     gmsh                 [.] 
> > > > std::_List_base<GEdgeSigned, std::allocator<GE
> > > >    1.10%  gmsh     gmsh                 [.] GVertex::addEdge
> > > >    0.85%  gmsh     gmsh                 [.] List_Read
> > > >    0.75%  gmsh     gmsh                 [.] GEdge::getBeginVertex
> > > >    0.73%  gmsh     gmsh                 [.] GFace::computeMeanPlane
> > > >    0.73%  gmsh     libc-2.12.so         [.] free
> > > >    0.67%  gmsh     gmsh                 [.] CTX::instance
> > > >    0.65%  gmsh     gmsh                 [.] 
> > > > gmshEdge::resetMeshAttributes
> > > > 
> > > > This suggests that maybe GModel::getEdgeByTag is eventually called a 
> > > > quadratic number of times in the number of PointInVolume-objects -- and 
> > > > this causes the drastic slow-down.
> > > > 
> > > > Could you please investigate this? Thanks a lot!
> > > > (As already mentioned, this issue also occurs with gmsh-4.x.x)
> > > > 
> > > > Best regards
> > > > A. S.
> > > > _______________________________________________
> > > > gmsh mailing list
> > > > gmsh@onelab.info
> > > > http://onelab.info/mailman/listinfo/gmsh
> > > 
> > > — 
> > > Prof. Christophe Geuzaine
> > > University of Liege, Electrical Engineering and Computer Science 
> > > http://www.montefiore.ulg.ac.be/~geuzaine
> > > 
> > > 
> > > 
> > 
> > — 
> > Prof. Christophe Geuzaine
> > University of Liege, Electrical Engineering and Computer Science 
> > http://www.montefiore.ulg.ac.be/~geuzaine
> > 
> > 
> > 
> 
> — 
> Prof. Christophe Geuzaine
> University of Liege, Electrical Engineering and Computer Science 
> http://www.montefiore.ulg.ac.be/~geuzaine
> 
> 
> 

— 
Prof. Christophe Geuzaine
University of Liege, Electrical Engineering and Computer Science 
http://www.montefiore.ulg.ac.be/~geuzaine




_______________________________________________
gmsh mailing list
gmsh@onelab.info
http://onelab.info/mailman/listinfo/gmsh

Re: [Gmsh] "Re: GMSH parsing is very slow"

Reply via email to