On Tue, Oct 15, 2019 at 9:58 PM Luis Machado <[email protected]> wrote:
>
> Hi,
>
> I'd like to get some feedback from the compiler's side before
> implementing a fix for this line numbering problem. I also want to make
> sure i fix it in the right tool.
>
> This is related to this bug report in GDB's bugzilla:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21221
>
> It deals with the cases where we have loops with empty bodies, empty
> headers (for loops) or that simply were written in a single line. This
> causes GCC to not emit line transitions in one way or another. As a
> consequence, GDB won't see the line transition and will continuously
> attempt to step/next until it sees one.
>
> For the end user it appears GDB is stuck in a particular loop, with most
> of them hitting ctrl-C to interrupt it. In reality GDB is making
> progress in the loop, but it will only stop once it goes out of the
> loop, where it will see a line transition.
>
> For the sake of reducing the scope of the problem, I'll assume the loops
> are written across multiple lines and that we're interested in O0
> debugging. Higher optimization levels would probably reshape the loop or
> reduce it to a single instruction in some cases.
>
> Take for example the case of BZ #21221...
>
> int main (void)
> {
> while (1)
> {
> 5 for (unsigned int i = 0U; i < 0xFFFFFU; i++)
> 6 {
> 7 ;
> 8 }
> }
> }
>
> GCC generates the following code:
>
> 0x00000000000005fa <+0>: push %rbp
> 0x00000000000005fb <+1>: mov %rsp,%rbp
> 0x00000000000005fe <+4>: movl $0x0,-0x4(%rbp)
> 0x0000000000000605 <+11>: jmp 0x60b <main+17>
> 0x0000000000000607 <+13>: addl $0x1,-0x4(%rbp)
> 0x000000000000060b <+17>: cmpl $0xffffe,-0x4(%rbp)
> 0x0000000000000612 <+24>: jbe 0x607 <main+13>
> 0x0000000000000614 <+26>: jmp 0x5fe <main+4>
>
> And the line table looks like this:
>
> Line Number Statements:
> [0x00000047] Extended opcode 2: set Address to 0x5fa
> [0x00000052] Special opcode 6: advance Address by 0 to 0x5fa and
> Line by 1 to 2
> [0x00000053] Special opcode 64: advance Address by 4 to 0x5fe and
> Line by 3 to 5
> [0x00000054] Extended opcode 4: set Discriminator to 3
> [0x00000058] Set is_stmt to 0
> [0x00000059] Special opcode 131: advance Address by 9 to 0x607 and
> Line by 0 to 5
> [0x0000005a] Extended opcode 4: set Discriminator to 1
> [0x0000005e] Special opcode 61: advance Address by 4 to 0x60b and
> Line by 0 to 5
> [0x0000005f] Special opcode 131: advance Address by 9 to 0x614 and
> Line by 0 to 5
> [0x00000060] Advance PC by 2 to 0x616
> [0x00000062] Extended opcode 1: End of Sequence
>
> GCC doesn't generate any code or line number transitions for the empty
> loop body, therefore GDB keeps cycling inside this loop, in line 5.
>
> Clang, on the other hand, seems to be a bit smarter about this and will
> generate a dummy jump to help the debugger.
>
> Here's Clang's code:
>
> 0x00000000004004a0 <+0>: push %rbp
> 0x00000000004004a1 <+1>: mov %rsp,%rbp
> 0x00000000004004a4 <+4>: movl $0x0,-0x4(%rbp)
> 0x00000000004004ab <+11>: movl $0x0,-0x8(%rbp)
> 0x00000000004004b2 <+18>: cmpl $0xfffff,-0x8(%rbp)
> 0x00000000004004b9 <+25>: jae 0x4004d2 <main+50>
> X 0x00000000004004bf <+31>: jmpq 0x4004c4 <main+36>
> X 0x00000000004004c4 <+36>: mov -0x8(%rbp),%eax
> 0x00000000004004c7 <+39>: add $0x1,%eax
> 0x00000000004004ca <+42>: mov %eax,-0x8(%rbp)
> 0x00000000004004cd <+45>: jmpq 0x4004b2 <main+18>
> 0x00000000004004d2 <+50>: jmpq 0x4004ab <main+11>
>
> X marks the spot where a dummy jump was inserted to aid the debugger.
> The line table looks like this:
>
> Line Number Statements:
> [0x00000070] Extended opcode 2: set Address to 0x4004a0
> [0x0000007b] Special opcode 6: advance Address by 0 to 0x4004a0 and
> Line by 1 to 2
> [0x0000007c] Set column to 23
> [0x0000007e] Set prologue_end to true
> [0x0000007f] Special opcode 162: advance Address by 11 to 0x4004ab
> and Line by 3 to 5
> [0x00000080] Set column to 33
> [0x00000082] Set is_stmt to 0
> [0x00000083] Special opcode 103: advance Address by 7 to 0x4004b2
> and Line by 0 to 5
> [0x00000084] Set column to 5
> [0x00000086] Set is_stmt to 1
> [0x00000087] Special opcode 103: advance Address by 7 to 0x4004b9
> and Line by 0 to 5
> X [0x00000088] Special opcode 92: advance Address by 6 to 0x4004bf and
> Line by 3 to 8
> X [0x00000089] Set column to 46
> [0x0000008b] Special opcode 72: advance Address by 5 to 0x4004c4 and
> Line by -3 to 5
> [0x0000008c] Set column to 5
> [0x0000008e] Set is_stmt to 0
> [0x0000008f] Special opcode 131: advance Address by 9 to 0x4004cd
> and Line by 0 to 5
> [0x00000090] Set column to 3
> [0x00000092] Set is_stmt to 1
> [0x00000093] Special opcode 73: advance Address by 5 to 0x4004d2 and
> Line by -2 to 3
> [0x00000094] Advance PC by 5 to 0x4004d7
> [0x00000096] Extended opcode 1: End of Sequence
>
> Again, X marks the spot where we tell the debugger there is a line
> transition (from line 5 to line 8), and so step/next execution should end.
>
> I'm inclined to say we should fix this in GCC in a similar way. GDB
> relies on the line table information since it can't correctly tell when
> we have transitioned to a new source line by looking just at the
> instruction stream.
>
> My idea is to create a dummy jump (gimple sounds more appropriate) with
> the source location of the last line of the loop body (in this case line
> number 8). That would trigger the creation of a new line table entry,
> making GDB happy.
>
> Is there a better way to force the compiler to output such a line table
> transition without having to resort to a dummy jump? Is there a safer
> way to add such transitions without worrying about the optimizer getting
> rid of them later on? Should we even worry about preserving such
> information for higher optimization levels?
>
> I'll also need a way to store the source location of the last line of
> the loop body, since closing braces and friends are ignored by GCC for
> code generation purposes. We just consume those tokens without second
> thought.
>
> There are other interesting variations, like the following:
>
> int main(void)
> {
> int var = 0;
>
> for (;;)
> {
> 7 var++;
> 8 }
>
> return 0;
> }
>
> In the case above, the debugger gets stuck in line 7. With the proposed
> solution it would transition to line 8 and then return to line 7.
>
> Another case is this one:
>
> int main (void)
> {
> while (1)
> {
> 5 for (unsigned int i = 0U; i < 0xFFFFFU; i++)
> 6 ;
> }
> }
>
> Similarly, GDB gets stuck in line 5. With the proposed fix, it would
> transition to line 6 before returning to line 5.
>
> Feedback would be greatly appreciated.
I think that adding an extra jump is unwanted. Instead - if you disregard
the single-source-line case - there's always the jump and the label we jump
to which might/should get different source locations. Like in one of the above
cases:
main ()
{
int D.1803;
[t.c:2:1] {
int var;
[t.c:3:5] var = 0;
<D.1801>:
[t.c:7:8] var = var + 1;
[t.c:7:8] goto <D.1801>;
[t.c:10:8] D.1803 = 0;
[t.c:10:8] return D.1803;
seen at GIMPLE. Of course we lose the label once we build the CFG,
but we retain a goto-locus which we could then put back on the
jump statement. For this case we at the moment get
.L2:
.loc 1 7 0 discriminator 1
addl $1, -4(%rbp)
jmp .L2
and we could do
.L2:
.loc 1 7 0 discriminator 1
addl $1, -4(%rbp)
.loc 1 5 0
jmp .L2
thus assign the "destination" location to the jump instruction?
The first question is of course what happens with the edges
goto_locus at the moment and why we get the code we get.
The above solution might also be a bit odd since for the loop
entry we'd first see line 7 and only after that line 5. But fixing
that would mean we have to output an extra instruction
(where I'd chose a nop instead of some random extra jump).
Richard.
> Thanks,
> Luis