https://github.com/DavidSpickett created 
https://github.com/llvm/llvm-project/pull/150011

Fixes https://github.com/llvm/llvm-project/issues/135707

Follow up to https://github.com/llvm/llvm-project/pull/148836
which fixed some of this issue but not all of it. 

Our Value/ValueObject system does not store the endian directly
in the values. Instead it assumes that the endian of the result
of a cast can be assumed to be the target's endian, or the host
but only as a fallback. It assumes the place it is copying from
is also that endian.

This breaks down when you have register values. These are always
host endian and continue to be when cast. Casting them to big 
endian when on a little endian host breaks certain calls like
GetValueAsUnsigned.

To fix this, check the context of the value. If it has a register
context, always treat it as host endian and make the result host
endian.

I had an alternative where I passed an "is_register" flag into
all calls to this, but it felt like a layering violation and changed
many more lines.

This solution isn't much more robust, but it works for all the test
cases I know of. Perhaps you can create a register value without
a RegisterInfo backing it, but I don't know of a way myself.

For testing, I had to add a minimal program file for each arch
so that there is a type system to support the casting. This is
generated from YAML since we only need the machine and endian
to be set.

>From 8bd469e89162bb08ec1b643227564906a548cc47 Mon Sep 17 00:00:00 2001
From: David Spickett <david.spick...@linaro.org>
Date: Mon, 14 Jul 2025 14:24:14 +0000
Subject: [PATCH 1/2] [lldb] Convert registers values into target endian for
 expressions

Fixes https://github.com/llvm/llvm-project/issues/135707

Where it was reported that reading the PC using "register read" had
different results to an expression "$pc".

This was happening because registers are treated in lldb as pure
"values" that don't really have an endian. We have to store them
somewhere on the host of course, so the endian becomes host endian.

When you want to use a register as a value in an expression you're
pretending that it's a variable in memory. In target memory.
Therefore we must convert the register value to that endian
before use.

The test I have added is based on the one used for XML register
flags. Where I fake an AArch64 little endian and an s390x big
endian target. I set up the data in such a way the pc value
should print the same for both, either with register read or
an expression.

I considered just adding a live process test that checks the
two are the same but with on one doing cross endian testing,
I doubt it would have ever caught this bug.

Simulating this means most of the time, little endian hosts
will test little to little and little to big. In the minority
of cases with a big endian host, they'll check the reverse.
Covering all the combinations.
---
 lldb/source/Expression/Materializer.cpp       | 25 +++---
 .../TestRegisterExpressionEndian.py           | 86 +++++++++++++++++++
 2 files changed, 97 insertions(+), 14 deletions(-)
 create mode 100644 
lldb/test/API/commands/expression/TestRegisterExpressionEndian.py

diff --git a/lldb/source/Expression/Materializer.cpp 
b/lldb/source/Expression/Materializer.cpp
index 17ea1596806d0..8fc3df1360824 100644
--- a/lldb/source/Expression/Materializer.cpp
+++ b/lldb/source/Expression/Materializer.cpp
@@ -1376,29 +1376,26 @@ class EntityRegister : public Materializer::Entity {
       return;
     }
 
-    DataExtractor register_data;
-
-    if (!reg_value.GetData(register_data)) {
-      err = Status::FromErrorStringWithFormat(
-          "couldn't get the data for register %s", m_register_info.name);
-      return;
-    }
-
-    if (register_data.GetByteSize() != m_register_info.byte_size) {
+    if (reg_value.GetByteSize() != m_register_info.byte_size) {
       err = Status::FromErrorStringWithFormat(
           "data for register %s had size %llu but we expected %llu",
-          m_register_info.name, (unsigned long 
long)register_data.GetByteSize(),
+          m_register_info.name, (unsigned long long)reg_value.GetByteSize(),
           (unsigned long long)m_register_info.byte_size);
       return;
     }
 
-    m_register_contents = std::make_shared<DataBufferHeap>(
-        register_data.GetDataStart(), register_data.GetByteSize());
+    lldb_private::DataBufferHeap buf(reg_value.GetByteSize(), 0);
+    reg_value.GetAsMemoryData(m_register_info, buf.GetBytes(),
+                              buf.GetByteSize(), map.GetByteOrder(), err);
+    if (!err.Success())
+      return;
+
+    m_register_contents = std::make_shared<DataBufferHeap>(buf);
 
     Status write_error;
 
-    map.WriteMemory(load_addr, register_data.GetDataStart(),
-                    register_data.GetByteSize(), write_error);
+    map.WriteMemory(load_addr, buf.GetBytes(), reg_value.GetByteSize(),
+                    write_error);
 
     if (!write_error.Success()) {
       err = Status::FromErrorStringWithFormat(
diff --git a/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py 
b/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py
new file mode 100644
index 0000000000000..66e38df3a9696
--- /dev/null
+++ b/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py
@@ -0,0 +1,86 @@
+""" Check that registers written to memory for expression evaluation are
+    written using the target's endian not the host's.
+"""
+
+from enum import Enum
+from textwrap import dedent
+import lldb
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test.decorators import *
+from lldbsuite.test.gdbclientutils import *
+from lldbsuite.test.lldbgdbclient import GDBRemoteTestBase
+
+
+class Endian(Enum):
+    BIG = 0
+    LITTLE = 1
+
+
+class Responder(MockGDBServerResponder):
+    def __init__(self, doc, endian):
+        super().__init__()
+        self.target_xml = doc
+        self.endian = endian
+
+    def qXferRead(self, obj, annex, offset, length):
+        if annex == "target.xml":
+            return self.target_xml, False
+        return (None,)
+
+    def readRegister(self, regnum):
+        return "E01"
+
+    def readRegisters(self):
+        # 64 bit pc value.
+        data = ["00", "00", "00", "00", "00", "00", "12", "34"]
+        if self.endian == Endian.LITTLE:
+            data.reverse()
+        return "".join(data)
+
+
+class TestXMLRegisterFlags(GDBRemoteTestBase):
+    def do_endian_test(self, endian):
+        architecture, pc_reg_name = {
+            Endian.BIG: ("s390x", "pswa"),
+            Endian.LITTLE: ("aarch64", "pc"),
+        }[endian]
+
+        self.server.responder = Responder(
+            dedent(
+                f"""\
+            <?xml version="1.0"?>
+              <target version="1.0">
+                <architecture>{architecture}</architecture>
+                <feature>
+                  <reg name="{pc_reg_name}" bitsize="64"/>
+                </feature>
+            </target>"""
+            ),
+            endian,
+        )
+        target = self.dbg.CreateTarget("")
+        process = self.connect(target)
+        lldbutil.expect_state_changes(
+            self, self.dbg.GetListener(), process, [lldb.eStateStopped]
+        )
+
+        # If expressions convert register values into target endian, the
+        # result of register read and expr should be the same.
+        pc_value = "0x0000000000001234"
+        self.expect(
+            "register read pc",
+            substrs=[pc_value],
+        )
+        self.expect("expr --format hex -- $pc", substrs=[pc_value])
+
+    @skipIfXmlSupportMissing
+    @skipIfRemote
+    def test_little_endian_target(self):
+        self.do_endian_test(Endian.LITTLE)
+
+    @skipIfXmlSupportMissing
+    @skipIfRemote
+    # Unlike AArch64, we do need the backend present for this test to work.
+    @skipIfLLVMTargetMissing("SystemZ")
+    def test_big_endian_target(self):
+        self.do_endian_test(Endian.BIG)

>From 680949f753d12e0efc0a4bfb98cb4aebf4486b57 Mon Sep 17 00:00:00 2001
From: David Spickett <david.spick...@linaro.org>
Date: Thu, 17 Jul 2025 13:40:11 +0000
Subject: [PATCH 2/2] [lldb] account for registers being host endian when
 casting values

Fixes https://github.com/llvm/llvm-project/issues/135707

Follow up to https://github.com/llvm/llvm-project/pull/148836
which fixed some of this issue but not all of it.

Our Value/ValueObject system does not store the endian directly
in the values. Instead it assumes that the endian of the result
of a cast can be assumed to be the target's endian, or the host
but only as a fallback. It assumes the place it is copying from
is also that endian.

This breaks down when you have register values. These are always
host endian and continue to be when cast. Casting them to big
endian when on a little endian host breaks certain calls like
GetValueAsUnsigned (oddly, others did still work, but I couldn't
figure out why).

To fix this, check the context of the value. If it has a register
context, always treat it as host endian and make the result host
endian.

I had an alternative where I passed an "is_register" flag into
all calls to this, but it felt like a layering violation and changed
many more lines.

This solution isn't much more robust, but it works for all the test
cases I know of. Perhaps you can create a register value without
a RegisterInfo backing it, but I don't know of a way myself.

For testing, I had to add a minimal program file for each arch
so that there is a type system to support the casting. This is
generated from YAML since we only need the machine and endian
to be set.
---
 lldb/source/Core/Value.cpp                    |  5 +-
 .../TestRegisterExpressionEndian.py           | 54 ++++++++++++++++---
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lldb/source/Core/Value.cpp b/lldb/source/Core/Value.cpp
index c91b3f852f986..ff37c87f5dd6a 100644
--- a/lldb/source/Core/Value.cpp
+++ b/lldb/source/Core/Value.cpp
@@ -496,7 +496,10 @@ Status Value::GetValueAsData(ExecutionContext *exe_ctx, 
DataExtractor &data,
     if (exe_ctx) {
       Target *target = exe_ctx->GetTargetPtr();
       if (target) {
-        data.SetByteOrder(target->GetArchitecture().GetByteOrder());
+        // Registers are always stored in host endian.
+        data.SetByteOrder(m_context_type == ContextType::RegisterInfo
+                              ? endian::InlHostByteOrder()
+                              : target->GetArchitecture().GetByteOrder());
         
data.SetAddressByteSize(target->GetArchitecture().GetAddressByteSize());
         break;
       }
diff --git a/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py 
b/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py
index 66e38df3a9696..d6de8731385b6 100644
--- a/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py
+++ b/lldb/test/API/commands/expression/TestRegisterExpressionEndian.py
@@ -40,9 +40,15 @@ def readRegisters(self):
 
 class TestXMLRegisterFlags(GDBRemoteTestBase):
     def do_endian_test(self, endian):
-        architecture, pc_reg_name = {
-            Endian.BIG: ("s390x", "pswa"),
-            Endian.LITTLE: ("aarch64", "pc"),
+        architecture, pc_reg_name, yaml_file, data, machine = {
+            Endian.BIG: ("s390x", "pswa", "s390x.yaml", "ELFDATA2MSB", 
"EM_S390"),
+            Endian.LITTLE: (
+                "aarch64",
+                "pc",
+                "aarch64.yaml",
+                "ELFDATA2LSB",
+                "EM_AARCH64",
+            ),
         }[endian]
 
         self.server.responder = Responder(
@@ -58,14 +64,35 @@ def do_endian_test(self, endian):
             ),
             endian,
         )
-        target = self.dbg.CreateTarget("")
+
+        # We need to have a program file, so that we have a full type system,
+        # so that we can do the casts later.
+        obj_path = self.getBuildArtifact("main.o")
+        yaml_path = self.getBuildArtifact(yaml_file)
+        with open(yaml_path, "w") as f:
+            f.write(
+                dedent(
+                    f"""\
+                --- !ELF
+                FileHeader:
+                  Class:    ELFCLASS64
+                  Data:     {data}
+                  Type:     ET_REL
+                  Machine:  {machine}
+                ...
+                """
+                )
+            )
+        self.yaml2obj(yaml_path, obj_path)
+        target = self.dbg.CreateTarget(obj_path)
+
         process = self.connect(target)
         lldbutil.expect_state_changes(
             self, self.dbg.GetListener(), process, [lldb.eStateStopped]
         )
 
         # If expressions convert register values into target endian, the
-        # result of register read and expr should be the same.
+        # result of register read, expr and casts should be the same.
         pc_value = "0x0000000000001234"
         self.expect(
             "register read pc",
@@ -73,14 +100,29 @@ def do_endian_test(self, endian):
         )
         self.expect("expr --format hex -- $pc", substrs=[pc_value])
 
+        pc = (
+            process.thread[0]
+            .frame[0]
+            .GetRegisters()
+            .GetValueAtIndex(0)
+            .GetChildMemberWithName("pc")
+        )
+        ull = target.FindTypes("unsigned long long").GetTypeAtIndex(0)
+        pc_ull = pc.Cast(ull)
+
+        self.assertEqual(pc.GetValue(), pc_ull.GetValue())
+        self.assertEqual(pc.GetValueAsAddress(), pc_ull.GetValueAsAddress())
+        self.assertEqual(pc.GetValueAsSigned(), pc_ull.GetValueAsSigned())
+        self.assertEqual(pc.GetValueAsUnsigned(), pc_ull.GetValueAsUnsigned())
+
     @skipIfXmlSupportMissing
     @skipIfRemote
+    @skipIfLLVMTargetMissing("AArch64")
     def test_little_endian_target(self):
         self.do_endian_test(Endian.LITTLE)
 
     @skipIfXmlSupportMissing
     @skipIfRemote
-    # Unlike AArch64, we do need the backend present for this test to work.
     @skipIfLLVMTargetMissing("SystemZ")
     def test_big_endian_target(self):
         self.do_endian_test(Endian.BIG)

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to