> -----Original Message----- > From: Milian Wolff [mailto:m...@milianw.de] > Sent: Saturday, January 23, 2016 15:41 > To: cmake-developers@cmake.org > Cc: James Johnston > Subject: Re: [cmake-developers] CMake daemon for user tools > > You are aware that modern std::string is SSO'ed? I'm running on such a > system. > Another reason why you should not reinvent the wheel and keep relying on > the STL wherever possible. > <snip> > Qt has such a class, it's called QVarLengthArray, and I've also been able to > apply it in multiple occasions to good effect. That said, when you look at
Yeah, but std::string is platform dependent, and the size of the buffer is also platform dependent. Maybe it tends to be optimal for CMake. Then again, maybe a larger buffer is needed. I don't know. The flexible option would be something that does exactly like QVarLengthArray. Different variables might have different optimal sizes. Some sample small strings for gcc/clang/VC++: http://stackoverflow.com/a/28003328/562766 Note that none of them are large enough to store an absolute path, which are maybe common (???) in CMake. Also there's a fair bit of variation; if CMake wants consistent performance in a section of code across compilers, it would need a way to explicitly specify the small string size. For example, some are large enough to store typical target sizes - and some maybe are not. There is also boost::container::small_vector in addition to QVarLengthArray: http://www.boost.org/doc/libs/1_60_0/doc/html/boost/container/small_vector.h tml > Just run cmake (or the daemon) through a profiler and check the results. > Doing so for the daemon (built with RelWithDebInfo) on the LLVM build dir > and recording it with `perf --call-graph lbr` I get these hotspots when looking > at the results with `perf report -g graph --no-children`: > > + 8.67% cmake cmake [.] > cmGlobalGenerator::FindGeneratorTargetImpl > + 4.21% cmake libc-2.22.so [.] _int_malloc > + 2.67% cmake cmake [.] cmCommandArgument_yylex > + 2.09% cmake libc-2.22.so [.] _int_free > + 2.06% cmake libc-2.22.so [.] __memcmp_sse4_1 > + 1.84% cmake libc-2.22.so [.] malloc > > This already shows you that you can gain a lot by reducing the number of > allocations done. Heaptrack is a good tool for that. Next question would be: who is calling malloc? Or rather, what % of callers are std::string, std::vector, other STL classes vs custom data structures? Next question would be: what is the size of those mallocs, for each caller? (Sorry I don't currently have an environment set up with a profiler to test this myself at the moment.) > Similarly, someone should > investigate cmGlobalGenerator::FindGeneratorTargetImpl. That does a lot of > string comparisons to find targets from my quick glance, so indeed could be > sped up with a smarter string class. > > But potentially you could also get a much quicker lookup by storing a hash > map of target name to cmGeneratorTarget. Indeed; there has got to be a way to reduce the complexity of that function in number of targets compared, if not the low-level string comparison itself as well. For example, if target names are short-ish, the string class has large enough SSO, and the underlying string class made use of vector CPU instructions for comparison, there is probably very little to be gained without such a hash map. (On the other hand, if some of the previous assumptions are not true on some common CMake platforms....) > Seems like there's more than enough areas one could (and should) optimize > CMake. Indeed. Another idea - probably unrelated to the string allocations issue, but still - that came to mind: what if link-time code generation/optimization is turned on? IIRC this is not default in CMake. Maybe CMake is sufficiently well-organized (e.g. small functions implementations moved to header files) such that what needs to be inlined across units, is already being inlined. Then again, maybe it's not. I've seen other projects rely on this feature to keep clean organization by keeping implementations in .CPP files without sacrificing performance, and when you turn off LTCG performance takes a major hit... Also IIRC there are still a few optimizations that are turned off when CMake is built with RelWithDebInfo instead of Release. I forget the exact specifics at the moment but e.g. on Visual C++ when you ask it to turn on debug symbols, it will change the default values of some optimization flags. So a cursory examination of the flags wouldn't reveal all cases. However, one of my bigger performance gripes, being a primarily Windows developer right now, is the process creation overhead, especially during configuration. I think that is completely dominating over any CMake code being run internally. It would be nice if that could be parallelized on my 6-core hyper-threaded CPU, but doing so properly probably isn't so easy... Best regards, James Johnston -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers