Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction
On 13/12/18 20:34 +, Jonathan Wakely wrote: Construction and modification of paths is now done more efficiently, by splitting the input into a stack-based buffer of string_view objects instead of a dynamically-allocated vector containing strings. Once the final size is known only a single allocation is needed to reserve space for it. The append and concat operations no longer require constructing temporary path objects, nor re-parsing the entire native pathname. This patch fixes a regression in the append/concat code, introduced by the changes to avoid constructing an intermediate path. The problem wasn't caught by existing tests, so I've improved the tests. Tested x86_64-linux, committed to trunk. commit 7583088c09b5dd5d33a98abd1f61b0ccff961d6e Author: Jonathan Wakely Date: Mon Dec 17 21:42:48 2018 + PR libstdc++/71044 fix off-by-one errors introduced recently The recent changes to append/concat directly from strings (without constructing paths) introduced regressions where one of the components could be omitted from the iteration sequence in the result. PR libstdc++/71044 * src/filesystem/std-path.cc (path::_M_append): Fix off-by-one error that caused a component to be lost from the iteration sequence. (path::_M_concat): Likewise. * testsuite/27_io/filesystem/path/append/source.cc: Test appending long strings. * testsuite/27_io/filesystem/path/concat/strings.cc: Test concatenating long strings. * testsuite/27_io/filesystem/path/construct/string_view.cc: Test construction from long string. diff --git a/libstdc++-v3/src/filesystem/std-path.cc b/libstdc++-v3/src/filesystem/std-path.cc index d2520492c03..b5ddbdad149 100644 --- a/libstdc++-v3/src/filesystem/std-path.cc +++ b/libstdc++-v3/src/filesystem/std-path.cc @@ -781,10 +781,11 @@ path::_M_append(basic_string_view s) ::new(output++) _Cmpt(c.str, c.type, parser.offset(c)); ++_M_cmpts._M_impl->_M_size; } - for (auto c = parser.next(); c.valid(); c = parser.next()) + while (cmpt.valid()) { - ::new(output++) _Cmpt(c.str, c.type, parser.offset(c)); + ::new(output++) _Cmpt(cmpt.str, cmpt.type, parser.offset(cmpt)); ++_M_cmpts._M_impl->_M_size; + cmpt = parser.next(); } if (s.back() == '/') @@ -1139,7 +1140,7 @@ path::_M_concat(basic_string_view s) } #endif } - else if (orig_filenamelen == 0) + else if (orig_filenamelen == 0 && extra.empty()) { // Replace empty filename at end of original path. std::prev(output)->_M_pathname = it->str; diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc index e440ca921c7..21ae6be3d97 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc @@ -137,6 +137,22 @@ test05() s = second->native(); p3 /= s; VERIFY( p3.string() == "0/123456789/a/123456789" ); + } + +void +test06() +{ + const std::string s0 = "a/b/c"; + path p = s0; + std::string s; + for (int i = 0; i < 10; ++i) +s += "0/1/2/3/4/5/6/7/8/9/"; + // append a long string with many components + test(p, s.c_str()); + + // Same again but with a trailing slash on the left operand: + path p2 = s0 + '/'; + test(p2, s.c_str()); } int @@ -147,4 +163,5 @@ main() test03(); test04(); test05(); + test06(); } diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc index eea9b6dc69b..316f7c3d7bd 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc @@ -24,8 +24,10 @@ #include #include #include +#include using std::filesystem::path; +using __gnu_test::compare_paths; void test01() @@ -60,7 +62,7 @@ test01() void test02() { - std::basic_string_view s; + std::basic_string_view s, expected; path p = "0/1/2/3/4/5/6"; // The string_view aliases the path's internal string: @@ -68,25 +70,54 @@ test02() // Append that string_view, which must work correctly even though the // internal string will be reallocated during the operation: p += s; - VERIFY( p.string() == "0/1/2/3/4/5/60/1/2/3/4/5/6" ); + compare_paths(p, "0/1/2/3/4/5/60/1/2/3/4/5/6"); // Same again with a trailing slash: path p2 = "0/1/2/3/4/5/"; s = p2.native(); p2 += s; - VERIFY( p2.string() == "0/1/2/3/4/5/0/1/2/3/4/5/" ); + compare_paths(p2, "0/1/2/3/4/5/0/1/2/3/4/5/"); // And aliasing one of the components of the path: path p3 = "0/123456789"; path::iterator second = std::next(p3.begin()); s = second->native(); p3 += s; - VERIFY( p3.string() == "0/123456789123456789" ); + compare_paths(p3,
Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction
On 12/12/18 17:22 +, Jonathan Wakely wrote: This new implementation has a smaller footprint than the previous implementation, due to replacing std::vector<_Cmpt> with a custom pimpl type that only needs a single pointer. The _M_type enumeration is also combined with the pimpl type, by using a tagged pointer, reducing sizeof(path) further still. Construction and modification of paths is now done more efficiently, by splitting the input into a stack-based buffer of string_view objects instead of a dynamically-allocated vector containing strings. Once the final size is known only a single allocation is needed to reserve space for it. The append and concat operations no longer require constructing temporary path objects, nor re-parsing the entire native pathname. This results in algorithmic improvements to path construction, and working with large paths is much faster. PR libstdc++/71044 * include/bits/fs_path.h (path::path(path&&)): Add noexcept when appropriate. Move _M_cmpts instead of reparsing the native pathname. (path::operator=(const path&)): Do not define as defaulted. (path::operator/=, path::append): Call _M_append. (path::concat): Call _M_concat. (path::path(string_type, _Type): Change type of first parameter to basic_string_view. (path::_M_append(string_type), path::_M_concat(string_type)): New member functions. (path::_S_is_dir_sep): Replace with non-member is_dir_sep. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. (path::_M_type()): New member function to replace _M_type data member. (path::_List): Define new struct type instead of using std::vector. (path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of first parameter to basic_string_view. (path::operator+=(const path&)): Do not define inline. (path::operator+=(const string_type&)): Call _M_concat. (path::operator+=(const value_type*)): Likewise. (path::operator+=(value_type)): Likewise. (path::operator+=(basic_string_view)): Likewise. (path::operator/=(const path&)): Do not define inline. (path::_M_append(path)): Remove. * python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that understands the new path::_List type. * src/filesystem/std-path.cc (is_dir_sep): New function to replace path::_S_is_dir_sep. (path::_Parser): New helper class to parse strings as paths. (path::_List::_Impl): Define container type for path components. (path::_List): Define members. (path::operator=(const path&)): Define explicitly, to provide the strong exception safety guarantee. (path::operator/=(const path&)): Implement manually by processing each component of the argument, rather than using _M_split_cmpts to parse the entire string again. (path::_M_append(string_type)): Likewise. (path::operator+=(const path&)): Likewise. (path::_M_concat(string_type)): Likewise. (path::remove_filename()): Perform trim directly instead of calling _M_trim(). (path::_M_split_cmpts()): Rewrite in terms of _Parser class. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. Tested x86_64-linux. I intend to commit this to trunk tomorrow. There are some other performance improvements that could be made to the std::filesystem implementation, but this is the main thing I wanted to do for PR 71044. Here's an improved patch which I'm commmitting to trunk. The previous patch added _M_append(string_type) and _M_concat(string_type), which had the advantage that the argument owned its contents and so would not be invalidated by modifications to the strings contained in *this. The new version of the patch uses string_view arguments for those functions. This avoids allocating a string when appending a NTBS or an existing string_view, at the expense of some additional complexity to ensure the contents of the string_view are not used after they could become invalidated (e.g. if they refer to one of the strings contained in *this which gets altered by the append/concat operation). This also fixes some bugs in the MinGW-specific code, which I hadn't tested yesterday. All 27_io/filesystem/path/* tests pass on x86_64-w64-mingw32 now. This puts us in a position where the std::filesystem::path class can be moved from libstdc++fs.a to the main libstdc++.so and enabled unconditionally. I'll do that in the next few days. (The directory iterators and filesystem "operations" like copy_file will still need to be conditionally defined, depending on whether the target supports the necessary OS APIs.) Tested x86_64-linux, committed to trunk. commit 355eb6608a70864f05c3003ac8a1713b64e0d802 Author: redi Date: Thu Dec 13 11:01:03 2018 +
[PATCH] PR libstdc++/71044 optimize std::filesystem::path construction
This new implementation has a smaller footprint than the previous implementation, due to replacing std::vector<_Cmpt> with a custom pimpl type that only needs a single pointer. The _M_type enumeration is also combined with the pimpl type, by using a tagged pointer, reducing sizeof(path) further still. Construction and modification of paths is now done more efficiently, by splitting the input into a stack-based buffer of string_view objects instead of a dynamically-allocated vector containing strings. Once the final size is known only a single allocation is needed to reserve space for it. The append and concat operations no longer require constructing temporary path objects, nor re-parsing the entire native pathname. This results in algorithmic improvements to path construction, and working with large paths is much faster. PR libstdc++/71044 * include/bits/fs_path.h (path::path(path&&)): Add noexcept when appropriate. Move _M_cmpts instead of reparsing the native pathname. (path::operator=(const path&)): Do not define as defaulted. (path::operator/=, path::append): Call _M_append. (path::concat): Call _M_concat. (path::path(string_type, _Type): Change type of first parameter to basic_string_view. (path::_M_append(string_type), path::_M_concat(string_type)): New member functions. (path::_S_is_dir_sep): Replace with non-member is_dir_sep. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. (path::_M_type()): New member function to replace _M_type data member. (path::_List): Define new struct type instead of using std::vector. (path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of first parameter to basic_string_view. (path::operator+=(const path&)): Do not define inline. (path::operator+=(const string_type&)): Call _M_concat. (path::operator+=(const value_type*)): Likewise. (path::operator+=(value_type)): Likewise. (path::operator+=(basic_string_view)): Likewise. (path::operator/=(const path&)): Do not define inline. (path::_M_append(path)): Remove. * python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that understands the new path::_List type. * src/filesystem/std-path.cc (is_dir_sep): New function to replace path::_S_is_dir_sep. (path::_Parser): New helper class to parse strings as paths. (path::_List::_Impl): Define container type for path components. (path::_List): Define members. (path::operator=(const path&)): Define explicitly, to provide the strong exception safety guarantee. (path::operator/=(const path&)): Implement manually by processing each component of the argument, rather than using _M_split_cmpts to parse the entire string again. (path::_M_append(string_type)): Likewise. (path::operator+=(const path&)): Likewise. (path::_M_concat(string_type)): Likewise. (path::remove_filename()): Perform trim directly instead of calling _M_trim(). (path::_M_split_cmpts()): Rewrite in terms of _Parser class. (path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir) (path::_M_add_filename): Remove. Tested x86_64-linux. I intend to commit this to trunk tomorrow. There are some other performance improvements that could be made to the std::filesystem implementation, but this is the main thing I wanted to do for PR 71044. commit 921bb5ac834156bc01e72a0049e27e39ed47c7ca Author: Jonathan Wakely Date: Wed Dec 12 16:27:11 2018 + PR libstdc++/71044 optimize std::filesystem::path construction This new implementation has a smaller footprint than the previous implementation, due to replacing std::vector<_Cmpt> with a custom pimpl type that only needs a single pointer. The _M_type enumeration is also combined with the pimpl type, by using a tagged pointer, reducing sizeof(path) further still. Construction and modification of paths is now done more efficiently, by splitting the input into a stack-based buffer of string_view objects instead of a dynamically-allocated vector containing strings. Once the final size is known only a single allocation is needed to reserve space for it. The append and concat operations no longer require constructing temporary path objects, nor re-parsing the entire native pathname. This results in algorithmic improvements to path construction, and working with large paths is much faster. PR libstdc++/71044 * include/bits/fs_path.h (path::path(path&&)): Add noexcept when appropriate. Move _M_cmpts instead of reparsing the native pathname. (path::operator=(const path&)): Do not define as defaulted. (path::operator/=, path::append): Call