Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-17 Thread Jonathan Wakely

On 13/12/18 20:34 +, Jonathan Wakely wrote:

   Construction and modification of paths is now done more efficiently, by
   splitting the input into a stack-based buffer of string_view objects
   instead of a dynamically-allocated vector containing strings. Once the
   final size is known only a single allocation is needed to reserve space
   for it.  The append and concat operations no longer require constructing
   temporary path objects, nor re-parsing the entire native pathname.


This patch fixes a regression in the append/concat code, introduced by
the changes to avoid constructing  an intermediate path. The problem
wasn't caught by existing tests, so I've improved the tests.

Tested x86_64-linux, committed to trunk.


commit 7583088c09b5dd5d33a98abd1f61b0ccff961d6e
Author: Jonathan Wakely 
Date:   Mon Dec 17 21:42:48 2018 +

PR libstdc++/71044 fix off-by-one errors introduced recently

The recent changes to append/concat directly from strings (without
constructing paths) introduced regressions where one of the components
could be omitted from the iteration sequence in the result.

PR libstdc++/71044
* src/filesystem/std-path.cc (path::_M_append): Fix off-by-one error
that caused a component to be lost from the iteration sequence.
(path::_M_concat): Likewise.
* testsuite/27_io/filesystem/path/append/source.cc: Test appending
long strings.
* testsuite/27_io/filesystem/path/concat/strings.cc: Test
concatenating long strings.
* testsuite/27_io/filesystem/path/construct/string_view.cc: Test
construction from long string.

diff --git a/libstdc++-v3/src/filesystem/std-path.cc b/libstdc++-v3/src/filesystem/std-path.cc
index d2520492c03..b5ddbdad149 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -781,10 +781,11 @@ path::_M_append(basic_string_view s)
 	  ::new(output++) _Cmpt(c.str, c.type, parser.offset(c));
 	  ++_M_cmpts._M_impl->_M_size;
 	}
-	  for (auto c = parser.next(); c.valid(); c = parser.next())
+	  while (cmpt.valid())
 	{
-	  ::new(output++) _Cmpt(c.str, c.type, parser.offset(c));
+	  ::new(output++) _Cmpt(cmpt.str, cmpt.type, parser.offset(cmpt));
 	  ++_M_cmpts._M_impl->_M_size;
+	  cmpt = parser.next();
 	}
 
 	  if (s.back() == '/')
@@ -1139,7 +1140,7 @@ path::_M_concat(basic_string_view s)
 	}
 #endif
 	}
-  else if (orig_filenamelen == 0)
+  else if (orig_filenamelen == 0 && extra.empty())
 	{
 	  // Replace empty filename at end of original path.
 	  std::prev(output)->_M_pathname = it->str;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
index e440ca921c7..21ae6be3d97 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
@@ -137,6 +137,22 @@ test05()
   s = second->native();
   p3 /= s;
   VERIFY( p3.string() == "0/123456789/a/123456789" );
+  }
+
+void
+test06()
+{
+  const std::string s0 = "a/b/c";
+  path p = s0;
+  std::string s;
+  for (int i = 0; i < 10; ++i)
+s += "0/1/2/3/4/5/6/7/8/9/";
+  // append a long string with many components
+  test(p, s.c_str());
+
+  // Same again but with a trailing slash on the left operand:
+  path p2 = s0 + '/';
+  test(p2, s.c_str());
 }
 
 int
@@ -147,4 +163,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
index eea9b6dc69b..316f7c3d7bd 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
@@ -24,8 +24,10 @@
 #include 
 #include 
 #include 
+#include 
 
 using std::filesystem::path;
+using __gnu_test::compare_paths;
 
 void
 test01()
@@ -60,7 +62,7 @@ test01()
 void
 test02()
 {
-  std::basic_string_view s;
+  std::basic_string_view s, expected;
 
   path p = "0/1/2/3/4/5/6";
   // The string_view aliases the path's internal string:
@@ -68,25 +70,54 @@ test02()
   // Append that string_view, which must work correctly even though the
   // internal string will be reallocated during the operation:
   p += s;
-  VERIFY( p.string() == "0/1/2/3/4/5/60/1/2/3/4/5/6" );
+  compare_paths(p, "0/1/2/3/4/5/60/1/2/3/4/5/6");
 
   // Same again with a trailing slash:
   path p2 = "0/1/2/3/4/5/";
   s = p2.native();
   p2 += s;
-  VERIFY( p2.string() == "0/1/2/3/4/5/0/1/2/3/4/5/" );
+  compare_paths(p2, "0/1/2/3/4/5/0/1/2/3/4/5/");
 
   // And aliasing one of the components of the path:
   path p3 = "0/123456789";
   path::iterator second = std::next(p3.begin());
   s = second->native();
   p3 += s;
-  VERIFY( p3.string() == "0/123456789123456789" );
+  compare_paths(p3, 

Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-13 Thread Jonathan Wakely

On 12/12/18 17:22 +, Jonathan Wakely wrote:

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call _M_append.
(path::concat): Call _M_concat.
(path::path(string_type, _Type): Change type of first parameter to
basic_string_view.
(path::_M_append(string_type), path::_M_concat(string_type)): New
member functions.
(path::_S_is_dir_sep): Replace with non-member is_dir_sep.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.
(path::_M_type()): New member function to replace _M_type data member.
(path::_List): Define new struct type instead of using std::vector.
(path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of
first parameter to basic_string_view.
(path::operator+=(const path&)): Do not define inline.
(path::operator+=(const string_type&)): Call _M_concat.
(path::operator+=(const value_type*)): Likewise.
(path::operator+=(value_type)): Likewise.
(path::operator+=(basic_string_view)): Likewise.
(path::operator/=(const path&)): Do not define inline.
(path::_M_append(path)): Remove.
* python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that
understands the new path::_List type.
* src/filesystem/std-path.cc (is_dir_sep): New function to replace
path::_S_is_dir_sep.
(path::_Parser): New helper class to parse strings as paths.
(path::_List::_Impl): Define container type for path components.
(path::_List): Define members.
(path::operator=(const path&)): Define explicitly, to provide the
strong exception safety guarantee.
(path::operator/=(const path&)): Implement manually by processing
each component of the argument, rather than using _M_split_cmpts
to parse the entire string again.
(path::_M_append(string_type)): Likewise.
(path::operator+=(const path&)): Likewise.
(path::_M_concat(string_type)): Likewise.
(path::remove_filename()): Perform trim directly instead of calling
_M_trim().
(path::_M_split_cmpts()): Rewrite in terms of _Parser class.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.

Tested x86_64-linux. I intend to commit this to trunk tomorrow.

There are some other performance improvements that could be made to
the std::filesystem implementation, but this is the main thing I
wanted to do for PR 71044.


Here's an improved patch which I'm commmitting to trunk. The previous
patch added _M_append(string_type) and _M_concat(string_type), which
had the advantage that the argument owned its contents and so would
not be invalidated by modifications to the strings contained in *this.
The new version of the patch uses string_view arguments for those
functions. This avoids allocating a string when appending a NTBS or
an existing string_view, at the expense of some additional complexity
to ensure the contents of the string_view are not used after they
could become invalidated (e.g. if they refer to one of the strings
contained in *this which gets altered by the append/concat operation).

This also fixes some bugs in the MinGW-specific code, which I hadn't
tested yesterday. All 27_io/filesystem/path/* tests pass on
x86_64-w64-mingw32 now.

This puts us in a position where the std::filesystem::path class can
be moved from libstdc++fs.a to the main libstdc++.so and enabled
unconditionally. I'll do that in the next few days. (The directory
iterators and filesystem "operations" like copy_file will still need
to be conditionally defined, depending on whether the target supports
the necessary OS APIs.)

Tested x86_64-linux, committed to trunk.

commit 355eb6608a70864f05c3003ac8a1713b64e0d802
Author: redi 
Date:   Thu Dec 13 11:01:03 2018 +


[PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-12 Thread Jonathan Wakely

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call _M_append.
(path::concat): Call _M_concat.
(path::path(string_type, _Type): Change type of first parameter to
basic_string_view.
(path::_M_append(string_type), path::_M_concat(string_type)): New
member functions.
(path::_S_is_dir_sep): Replace with non-member is_dir_sep.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.
(path::_M_type()): New member function to replace _M_type data member.
(path::_List): Define new struct type instead of using std::vector.
(path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of
first parameter to basic_string_view.
(path::operator+=(const path&)): Do not define inline.
(path::operator+=(const string_type&)): Call _M_concat.
(path::operator+=(const value_type*)): Likewise.
(path::operator+=(value_type)): Likewise.
(path::operator+=(basic_string_view)): Likewise.
(path::operator/=(const path&)): Do not define inline.
(path::_M_append(path)): Remove.
* python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that
understands the new path::_List type.
* src/filesystem/std-path.cc (is_dir_sep): New function to replace
path::_S_is_dir_sep.
(path::_Parser): New helper class to parse strings as paths.
(path::_List::_Impl): Define container type for path components.
(path::_List): Define members.
(path::operator=(const path&)): Define explicitly, to provide the
strong exception safety guarantee.
(path::operator/=(const path&)): Implement manually by processing
each component of the argument, rather than using _M_split_cmpts
to parse the entire string again.
(path::_M_append(string_type)): Likewise.
(path::operator+=(const path&)): Likewise.
(path::_M_concat(string_type)): Likewise.
(path::remove_filename()): Perform trim directly instead of calling
_M_trim().
(path::_M_split_cmpts()): Rewrite in terms of _Parser class.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.

Tested x86_64-linux. I intend to commit this to trunk tomorrow.

There are some other performance improvements that could be made to
the std::filesystem implementation, but this is the main thing I
wanted to do for PR 71044.


commit 921bb5ac834156bc01e72a0049e27e39ed47c7ca
Author: Jonathan Wakely 
Date:   Wed Dec 12 16:27:11 2018 +

PR libstdc++/71044 optimize std::filesystem::path construction

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call