[Lldb-commits] [lldb] [lldb] Fix issues handling ANSI codes and Unicode in option help (PR #183314)

David Spickett via lldb-commits Wed, 25 Feb 2026 07:28:05 -0800

https://github.com/DavidSpickett created 
https://github.com/llvm/llvm-project/pull/183314


Fixes #177570, and a bunch of FIXMEs for other tests known to be incorrect.

To do this, I have adapted code from the existing ansi::TrimAndPad. At first I 
tried a wrapper function, but there's a few things we need to handle that 
cannot be done with a simple wrapper.

We must only split at word boundaries. This requires knowing whether the last 
adjustment, which may be the final adjustment, was made at, or just before, a 
word boundary.

Also it must check for single words wider than the requested width (though this 
you could do with a wrapper).

For this reason, the new TrimAtWordBoundary has more special case checks and a 
more complex inner loop. Though the core is the same split into left, ansi 
escape code and right that TrimAndPad uses.

It is that splitting that implements the "bias" we need to print correctly 
formatted characters. When you have a preceeding ANSI code, this must be 
included in the printed range, same for proceeding. TrimAndPad already handled 
this, and I've copied that logic over.

TrimAndPad also used Unicode aware functions, which fixes the known issues with 
Unicode (though no command option actually uses Unicode at the moment).

This PR replaces PR #181860, where I tried to implement all this using a 
strategy that used "visible indexes" to decide where to cut the lines, and then 
converted those into "actual indexes" to know what to print.

This worked for most cases, but adding the "bias" was very complex. The 
preceeding codes were quite easy to do, but proceeding proved to be too complex.

I also had the feeling I was revinventing TrimAndPad and though it didn't turn 
out to be that simple, it wasn't far off.

As the majority of the work is now done in TrimAtWordBoundary, I have reused 
some existing OutputWordWrappedLines tests for the new function.

>From 7ad21f2c3a85526084189bcc7204b9a6f9d79dde Mon Sep 17 00:00:00 2001
From: David Spickett <[email protected]>
Date: Tue, 17 Feb 2026 16:04:45 +0000
Subject: [PATCH] [lldb] Fix issues handling ANSI codes and Unicode in option
 help

Fixes #177570, and a bunch of FIXMEs for other tests known to
be incorrect.

To do this, I have adapted code from the existing ansi::TrimAndPad.
At first I tried a wrapper function, but there's a few things
we need to handle that cannot be done with a simple wrapper.

We must only split at word boundaries. This requires knowing
whether the last adjustment, which may be the final adjustment,
was made at, or just before, a word boundary.

Also it must check for single words wider than the requested
width (though this you could do with a wrapper).

For this reason, the new TrimAtWordBoundary has more special
case checks and a more complex inner loop. Though the core
is the same split into left, ansi escape code and right that
TrimAndPad uses.

It is that splitting that implements the "bias" we need to
print correctly formatted characters. When you have a preceeding
ANSI code, this must be included in the printed range, same
for proceeding. TrimAndPad already handled this, and I've copied
that logic over.

TrimAndPad also used Unicode aware functions, which fixes
the known issues with Unicode (though no command option actually
uses Unicode at the moment).

This PR replaces PR #181860, where I tried to implement all this using
a strategy that used "visible indexes" to decide where to cut
the lines, and then converted those into "actual indexes" to
know what to print.

This worked for most cases, but adding the "bias" was very complex.
The preceeding codes were quite easy to do, but proceeding proved
to be too complex.

I also had the feeling I was revinventing TrimAndPad and
though it didn't turn out to be that simple, it wasn't far off.

As the majority of the work is now done in TrimAtWordBoundary,
I have reused some existing OutputWordWrappedLines tests for the
new function.
---
 lldb/include/lldb/Utility/AnsiTerminal.h    | 190 ++++++++++++++------
 lldb/test/API/commands/help/TestHelp.py     |  20 ++-
 lldb/unittests/Utility/AnsiTerminalTest.cpp | 177 ++++++++++++++----
 3 files changed, 290 insertions(+), 97 deletions(-)

diff --git a/lldb/include/lldb/Utility/AnsiTerminal.h 
b/lldb/include/lldb/Utility/AnsiTerminal.h
index 153602cc08b09..bae9fd98570f9 100644
--- a/lldb/include/lldb/Utility/AnsiTerminal.h
+++ b/lldb/include/lldb/Utility/AnsiTerminal.h
@@ -98,6 +98,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Locale.h"
+#include "llvm/Support/Unicode.h"
 
 #include "lldb/Utility/Stream.h"
 
@@ -240,6 +241,121 @@ inline std::string StripAnsiTerminalCodes(llvm::StringRef 
str) {
   return stripped;
 }
 
+inline size_t ColumnWidth(llvm::StringRef str) {
+  std::string stripped = ansi::StripAnsiTerminalCodes(str);
+  return llvm::sys::locale::columnWidth(stripped);
+}
+
+/// Trim the given string to the given visible length, at a word boundary.
+/// Visible length means its width when rendered to the terminal.
+/// The string can include ANSI codes and Unicode.
+///
+/// For a single word string, that word is returned in its entirety regardless
+/// of it's visible length.
+///
+/// This function is similar to TrimAndPad, except that it must split on a word
+/// boundary. So there are some noteable differences:
+/// * Has a special case for single words that exceed desired visible
+///   length.
+/// * Must track whether the most recent modifications was on a word boundary
+///   or not.
+/// * If the trimming finishes without the result ending on a word boundary,
+///   it must find the nearest boundary to that trim point by trimming more.
+inline std::string TrimAtWordBoundary(llvm::StringRef str,
+                                      size_t visible_length) {
+  str = str.trim();
+  if (str.empty())
+    return str.str();
+
+  auto first_whitespace = str.find_first_of(" \t\n");
+  // No whitespace means a single word, which we cannot split.
+  if (first_whitespace == llvm::StringRef::npos)
+    return str.str();
+
+  // If the first word of a multi-word string is too wide, return that whole
+  // word only.
+  auto to_first_word_boundary = str.substr(0, first_whitespace);
+  // We use ansi::ColumnWidth here because it can handle ANSI and Unicode.
+  if (static_cast<size_t>(ansi::ColumnWidth(to_first_word_boundary)) >
+      visible_length)
+    return to_first_word_boundary.str();
+
+  std::string result;
+  result.reserve(visible_length);
+  // When there is Unicode or ANSI codes, the visible length will not equal
+  // result.size(), so we track it separately.
+  size_t result_visible_length = 0;
+
+  // The loop below makes many adjustments, and we never know which will be the
+  // last. This tracks whether the most recent adjustment put us at a word
+  // boundary and is checked after the main loop.
+  bool at_word_boundary = false;
+
+  // Trim the string to the given visible length.
+  while (!str.empty() && result_visible_length < visible_length) {
+    auto [left, escape, right] = FindNextAnsiSequence(str);
+    str = right;
+
+    // We know that left does not include ANSI codes. Compute its visible 
length
+    // and if it fits, append it together with the invisible escape code.
+    size_t column_width = llvm::sys::locale::columnWidth(left);
+    if (result_visible_length + column_width <= visible_length) {
+      result.append(left).append(escape);
+      result_visible_length += column_width;
+      at_word_boundary = right.empty() || std::isspace(right[0]);
+
+      continue;
+    }
+
+    // The string might contain unicode which means it's not safe to truncate.
+    // Repeatedly trim the string until it its valid unicode and fits.
+    llvm::StringRef trimmed = left;
+
+    // A word break can happen at the character we trim to, or the one we
+    // trimmed before that (we are going backwards, so before in the loop is
+    // after in the string).
+
+    // A word break can happen at the point we trim, or just beyond that point.
+    // In other words: at the current back of trimmed, or what was the back 
last
+    // time around. following_char records the character popped in the previous
+    // loop iteration.
+    std::optional<char> following_char = std::nullopt;
+    while (!trimmed.empty()) {
+      int trimmed_width = llvm::sys::locale::columnWidth(trimmed);
+      if (
+          // If we have a partial Unicode character, keep trimming.
+          trimmed_width !=
+              llvm::sys::unicode::ColumnWidthErrors::ErrorInvalidUTF8 &&
+          // If the trimmed string fits in the column limit, stop trimming.
+          (result_visible_length + static_cast<size_t>(trimmed_width) <=
+           visible_length)) {
+        result.append(trimmed);
+        result_visible_length += trimmed_width;
+        at_word_boundary = std::isspace(trimmed.back()) ||
+                           (following_char && std::isspace(*following_char));
+
+        break;
+      }
+
+      following_char = trimmed.back();
+      trimmed = trimmed.drop_back();
+    }
+  }
+
+  if (!at_word_boundary) {
+    // Walk backwards to find a word boundary.
+    auto last_whitespace = result.find_last_of(" \t\n");
+    if (last_whitespace != std::string::npos)
+      result = result.substr(0, last_whitespace);
+  }
+
+  // We may have split on whitespace that was the first of a word boundary, or
+  // somewhere in a run of whitespace. Trim the trailing spaces. This must be
+  // done here instead of in the loop because in the loop we may still be
+  // accumulating the result string.
+  return llvm::StringRef(result).rtrim().str();
+}
+
 inline std::string TrimAndPad(llvm::StringRef str, size_t visible_length,
                               char padding = ' ') {
   std::string result;
@@ -284,74 +400,38 @@ inline std::string TrimAndPad(llvm::StringRef str, size_t 
visible_length,
   return result;
 }
 
-inline size_t ColumnWidth(llvm::StringRef str) {
-  std::string stripped = ansi::StripAnsiTerminalCodes(str);
-  return llvm::sys::locale::columnWidth(stripped);
-}
-
 // Output text that may contain ANSI codes, word wrapped (wrapped at 
whitespace)
 // to the given stream. The indent level of the stream is counted towards the
 // output line length.
-// FIXME: This contains several bugs and does not handle unicode.
+// FIXME: This does not handle unicode correctly.
+// FIXME: If an ANSI code is applied to multiple words and those words are 
split
+//        across lines, the code will apply to the indentation as well as the
+//        text.
 inline void OutputWordWrappedLines(Stream &strm, llvm::StringRef text,
                                    uint32_t output_max_columns) {
   // We will indent using the stream, so leading whitespace is not significant.
   text = text.ltrim();
-  if (text.size() == 0)
+  if (text.empty())
     return;
 
-  const size_t visible_length = ansi::ColumnWidth(text);
+  // 1 column border on the right side.
+  const uint32_t max_text_width =
+      output_max_columns - strm.GetIndentLevel() - 1;
+  bool first_line = true;
 
-  // Will it all fit on one line, or is it a single word that we must not 
break?
-  if (static_cast<uint32_t>(visible_length + strm.GetIndentLevel()) <
-          output_max_columns ||
-      text.find_first_of(" \t\n") == llvm::StringRef::npos) {
-    // Output it as a single line.
-    strm.Indent(text);
-    strm.EOL();
-    return;
-  }
-
-  // We need to break it up into multiple lines. We can do this based on the
-  // formatted text because we know that:
-  // * We only break lines on whitespace, therefore we will not break in the
-  //   middle of a Unicode character or escape code.
-  // * Escape codes are so far not applied to multiple words, so there is no
-  //   risk of breaking up a phrase and the escape code being incorrectly
-  //   applied to the indent too.
-
-  const int max_text_width = output_max_columns - strm.GetIndentLevel() - 1;
-  int start = 0;
-  int end = start;
-  const int final_end = visible_length;
-
-  while (end < final_end) {
-    // Don't start the 'text' on a space, since we're already outputting the
-    // indentation.
-    while ((start < final_end) && (text[start] == ' '))
-      start++;
-
-    end = start + max_text_width;
-    if (end > final_end)
-      end = final_end;
-
-    if (end != final_end) {
-      // If we're not at the end of the text, make sure we break the line on
-      // white space.
-      while (end > start && text[end] != ' ' && text[end] != '\t' &&
-             text[end] != '\n')
-        end--;
-    }
+  while (!text.empty()) {
+    std::string split = TrimAtWordBoundary(text, max_text_width);
 
-    const int sub_len = end - start;
-    if (start != 0)
+    llvm::StringRef split_ref(split);
+    split_ref = split_ref.rtrim();
+    if (!first_line)
       strm.EOL();
-    strm.Indent();
-    assert(start < final_end);
-    assert(start + sub_len <= final_end);
-    strm << text.substr(start, sub_len);
-    start = end + 1;
+    first_line = false;
+    strm.Indent(split_ref);
+
+    text = text.drop_front(split.size()).ltrim();
   }
+
   strm.EOL();
 }
 
diff --git a/lldb/test/API/commands/help/TestHelp.py 
b/lldb/test/API/commands/help/TestHelp.py
index 15cb07b1f32e6..cb6e9473c1047 100644
--- a/lldb/test/API/commands/help/TestHelp.py
+++ b/lldb/test/API/commands/help/TestHelp.py
@@ -327,8 +327,6 @@ def 
test_help_option_description_terminal_width_with_ansi(self):
         ANSI codes acccording to the terminal width."""
         self.runCmd("settings set use-color on")
 
-        # FIXME: lldb crashes when the width is exactly 135 - 
https://github.com/llvm/llvm-project/issues/177570
-
         # Should fit on one line.
         self.runCmd("settings set term-width 138")
         self.expect(
@@ -340,18 +338,28 @@ def 
test_help_option_description_terminal_width_with_ansi(self):
             ],
         )
 
-        # Must be printed on two lines.
-        # FIXME: Second line is truncated - 
https://github.com/llvm/llvm-project/issues/177570
         self.runCmd("settings set term-width 100")
         self.expect(
             "help breakpoint set",
             matching=True,
             patterns=[
-                r"\s+\x1b\[4mS\x1b\[0met the breakpoint only in this shared 
library.  Can repeat this option\n"
-                r"\s+multiple times to specify multiple shared li\n"
+                r"\s+\x1b\[4mS\x1b\[0met the breakpoint only in this shared 
library.  Can repeat this option multiple times\n"
+                r"\s+to specify multiple shared libraries.\n"
             ],
         )
 
+        # If we do not account for the difference between the visible 
character's
+        # position and that character's real position into the string with the 
invisible
+        # ANSI codes, we will crash in various ways. Writing tests for each 
width
+        # would require duplicating the line splitting algorithm here. So 
instead,
+        # we will try to provoke crashes if any exist, checking that the start
+        # and end of the output is shown.
+        for width in range(70, 150):
+            self.runCmd(f"settings set term-width {width}")
+            self.expect(
+                "help breakpoint set", substrs=["\x1b[4mS\x1b[0met the", 
"libraries."]
+            )
+
     @no_debug_info_test
     def test_help_shows_optional_short_options(self):
         """Test that optional short options are printed and that they are in
diff --git a/lldb/unittests/Utility/AnsiTerminalTest.cpp 
b/lldb/unittests/Utility/AnsiTerminalTest.cpp
index 28fa32461ad5f..7c052dfd9055c 100644
--- a/lldb/unittests/Utility/AnsiTerminalTest.cpp
+++ b/lldb/unittests/Utility/AnsiTerminalTest.cpp
@@ -118,6 +118,122 @@ TEST(AnsiTerminal, TrimAndPad) {
   EXPECT_EQ("12❤️45", ansi::TrimAndPad("12❤️45❤️", 5));
 }
 
+TEST(AnsiTerminal, TrimAtWordBoundary) {
+  // Nothing in, nothing out.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("", 0), "");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("", 1), "");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("", 1), "");
+
+  // All whitespace, return nothing.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("    ", 1), "");
+
+  // Leading and trailing whitespace are removed.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("     ab     ", 0), "ab");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("     ab     ", 5), "ab");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("    🦊🦊     ", 0), "🦊🦊");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("    🦊🦊     ", 5), "🦊🦊");
+
+  // When it is a single word, we ignore the max columns and return the word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 0), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 1), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 2), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 3), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 4), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abcdefghij", 2), "abcdefghij");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("🦊🦊", 0), "🦊🦊");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("🦊🦊", 4), "🦊🦊");
+
+  // If it fits, return the entire word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("abc", 5), "abc");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("🦊🦊", 5), "🦊🦊");
+
+  // ANSI codes do not add to width.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0m", 0), "\x1B[0m");
+  // Preceding ANSI codes are included.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0mab cd", 2), "\x1B[0mab");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0m🦊  🐱", 2), "\x1B[0m🦊");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("🦊\x1B[0m\x1B[0m🐱 🐈", 4),
+            "🦊\x1B[0m\x1B[0m🐱");
+  // Proceeding ANSI codes are included.
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0mab\x1B[0m cd", 2),
+            "\x1B[0mab\x1B[0m");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0mab\x1B[0m", 4),
+            "\x1B[0mab\x1B[0m");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0m🦊\x1B[0m 🐱", 2),
+            "\x1B[0m🦊\x1B[0m");
+  EXPECT_EQ(ansi::TrimAtWordBoundary("\x1B[0m🦊\x1B[0m", 4),
+            "\x1B[0m🦊\x1B[0m");
+
+  // When multiple words fit, include as many as we can while still ending on
+  // a word boundary.
+  const char *fox_ascii = "The quick brown fox jumped.";
+  // Can't fit one word, just returns first word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 0), "The");
+  // Exactly 3 is required for one word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 3), "The");
+  // Exactly 9 is required to fit 2 words.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 9), "The quick");
+  // So anything less than 9 is just one word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 8), "The");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 4), "The");
+  // 3 words is exactly 15.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 15), "The quick brown");
+  // Anything less is 2 words.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 14), "The quick");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, 10), "The quick");
+  // The whole string.
+  size_t fox_ascii_len = strlen(fox_ascii);
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, fox_ascii_len), fox_ascii);
+  // Anything less and we remove the last word.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_ascii, fox_ascii_len - 1),
+            "The quick brown fox");
+
+  // Width calculation is Unicode aware and a run of Unicode is a word just
+  // like a run of ASCII is.
+  // Note that these emoji avoid any compound emoji where there are
+  // non-printable modifiers. This is because llvm::sys::locale::columnWidth
+  // returns -1 for these non-printable adjustment characters. At this time,
+  // TrimAtWordBoundary simply cannot handle them well.
+  const char *fox_unicode = "🦊 💨🟤 🔼";
+  // Emoji have width 2, so this "word" would not fit so we just return it.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, 0), "🦊");
+  // It does fit width 2.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, 2), "🦊");
+  // Need 7 to fit 2 words.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, 7), "🦊 💨🟤");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, 6), "🦊");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, 4), "🦊");
+  // The entire string.
+  size_t fox_unicode_len = llvm::sys::locale::columnWidth(fox_unicode);
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, fox_unicode_len),
+            "🦊 💨🟤 🔼");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_unicode, fox_unicode_len - 1),
+            "🦊 💨🟤");
+
+  const char *fox_everything =
+      "The \x1B[0mquick\x1B[0m 💨\x1B[0m brown \x1B[0m🟤 fox🦊 🔼jumped.";
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 0), "The");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 3), "The");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 6), "The");
+  // Exactly 9 to fit two words.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 9),
+            "The \x1B[0mquick\x1B[0m");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 10),
+            "The \x1B[0mquick\x1B[0m");
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 11),
+            "The \x1B[0mquick\x1B[0m");
+  // <space><2 wide emoji> adds 3 more to get to 12.
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, 12),
+            "The \x1B[0mquick\x1B[0m 💨\x1B[0m");
+  // The entire string. We use the ansi:: width function here because it strips
+  // ANSI codes that llvm::sys::locale's function cannot cope with.
+  size_t fox_everything_len = ansi::ColumnWidth(fox_everything);
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, fox_everything_len),
+            fox_everything);
+  EXPECT_EQ(ansi::TrimAtWordBoundary(fox_everything, fox_everything_len - 1),
+            "The \x1B[0mquick\x1B[0m 💨\x1B[0m brown \x1B[0m🟤 fox🦊");
+}
+
 static void TestLines(const std::string &input, int indent,
                       uint32_t output_max_columns,
                       const llvm::StringRef &expected) {
@@ -128,55 +244,44 @@ static void TestLines(const std::string &input, int 
indent,
 }
 
 TEST(AnsiTerminal, OutputWordWrappedLines) {
-  TestLines("", 0, 0, "");
-  TestLines("", 0, 1, "");
-  TestLines("", 2, 1, "");
+  // Nothing in, nothing out. No newline, no indent.
+  TestLines("", 0, 5, "");
+  TestLines("", 5, 5, "");
 
-  // When it is a single word, we ignore the max columns and do not split it.
+  // A single line will have a newline on the end.
   TestLines("abc", 0, 1, "abc\n");
-  TestLines("abc", 0, 2, "abc\n");
-  TestLines("abc", 0, 3, "abc\n");
-  TestLines("abc", 0, 4, "abc\n");
-  TestLines("abc", 1, 5, " abc\n");
   TestLines("abc", 2, 5, "  abc\n");
+  TestLines("🦊🦊", 0, 0, "🦊🦊\n");
+  TestLines("🦊🦊", 0, 2, "🦊🦊\n");
+
+  // If the indent uses up all the columns, print the word on the same line
+  // anyway. This prevents us outputting indent only lines forever.
+  TestLines("abcdefghij", 4, 2, "    abcdefghij\n");
 
   // Leading whitespace is ignored because we're going to indent using the
   // stream.
+  TestLines("       ", 3, 10, "");
   TestLines("  abc", 0, 4, "abc\n");
   TestLines("        abc", 2, 6, "  abc\n");
 
+  // Multiple lines. Each one ends with a newline.
   TestLines("abc def", 0, 4, "abc\ndef\n");
   TestLines("abc def", 0, 5, "abc\ndef\n");
-  // Length is 6, 7 required. Has to split at whitespace.
-  TestLines("abc def", 0, 6, "abc\ndef\n");
-  // FIXME: This should split after abc, and not print
-  // more whitespace on the end of the line or the start
-  // of the new one. Resulting in "abc\ndef\n".
-  TestLines("abc           def", 0, 6, "abc  \ndef\n");
+  // Indent applied to each line.
+  TestLines("abc def", 2, 4, "  abc\n  def\n");
+  // First word is wider than a whole line, do not split that word.
+  TestLines("aabbcc ddee", 0, 5, "aabbcc\nddee\n");
 
   const char *fox_str = "The quick brown fox.";
   TestLines(fox_str, 0, 30, "The quick brown fox.\n");
   TestLines(fox_str, 5, 30, "     The quick brown fox.\n");
-  TestLines(fox_str, 0, 15, "The quick\nbrown fox.\n");
-  // FIXME: Trim the spaces off of the end of the first line.
-  TestLines("The quick       brown fox.", 0, 15,
-            "The quick     \nbrown fox.\n");
-
-  // As ANSI codes do not add to visible length, the results
-  // should be the same as the plain text verison.
-  const char *fox_str_ansi = "\x1B[4mT\x1B[0mhe quick brown fox.";
-  TestLines(fox_str_ansi, 0, 30, "\x1B[4mT\x1B[0mhe quick brown fox.\n");
-  TestLines(fox_str_ansi, 5, 30, "     \x1B[4mT\x1B[0mhe quick brown fox.\n");
-  // FIXME: Account for ANSI codes not contributing to visible length.
-  TestLines(fox_str_ansi, 0, 15, "\x1B[4mT\x1B[0mhe\nquick br\n");
-
-  const std::string fox_str_emoji = "🦊 The quick brown fox. 🦊";
-  TestLines(fox_str_emoji, 0, 30, "🦊 The quick brown fox. 🦊\n");
-  // FIXME: This crashes when max columns is exactly 31.
-  // TestLines(fox_str_emoji, 5, 31, "     🦊 The quick brown fox. 🦊\n");
-  TestLines(fox_str_emoji, 5, 32, "     🦊 The quick brown fox. 🦊\n");
-  // FIXME: Final fox is missing.
-  TestLines(fox_str_emoji, 0, 15, "🦊 The quick\nbrown fox. \n");
-  // FIXME: should not split the middle of an emoji.
-  TestLines("🦊🦊🦊 🦊🦊", 0, 5, "\n\n\n\n\n\n\n\x8A\xF0\x9F\xA6\n");
+  TestLines(fox_str, 2, 15, "  The quick\n  brown fox.\n");
+  // Must remove the spaces from the end of the first line.
+  TestLines("The quick       brown fox.", 0, 15, "The quick\nbrown fox.\n");
+
+  // FIXME: ANSI codes applied to > 1 word end up applying to all those words
+  // and the indent if those words are split up. We should use cursor
+  // positioning to do the indentation instead.
+  TestLines("\x1B[4mabc def\x1B[0m ghi", 2, 6,
+            "  \x1B[4mabc\n  def\x1B[0m\n  ghi\n");
 }

_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [lldb] Fix issues handling ANSI codes and Unicode in option help (PR #183314)

Reply via email to