On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote: > Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am: > >> No, they are all dispatched and issue to the BRU for execution. It's > >> trivial to construct a test of a lot of not taken branches in a row > >> and time a loop of it to see it executes at 1 cycle per branch. > > > > (s/dispatched/issued/) > > ?
Dispatch is from decode to the issue queues. Issue is from there to execution units. Dispatch is in-order, issue is not. > >> How could it validate prediction without issuing? It wouldn't know when > >> sources are ready. > > > > In the backend. But that is just how it worked on older cores :-/ > > Okay. I don't know about older cores than POWER9. Backend would normally > include execution though. > Only other place you could do it if you don't > issue/exec would be after it goes back in order, like completion. You do not have to do the verification in-order: the insn cannot finish until it is no longer speculative, that takes care of all ordering needed. > But that would be horrible for mispredict penalty. See the previous point. Also, any insn known to be mispredicted can be flushed immediately anyway. > >> >> The first problem seems like the show stopper though. AFAIKS it would > >> >> need a special builtin support that does something to create the table > >> >> entry, or a guarantee that we could put an inline asm right after the > >> >> builtin as a recognized pattern and that would give us the instruction > >> >> following the trap. > >> > > >> > I'm not quite sure what this means. Can't you always just put a > >> > > >> > bla: asm(""); > >> > > >> > in there, and use the address of "bla"? > >> > >> Not AFAIKS. Put it where? > > > > After wherever you want to know the address after. You will have to > > make sure they stay together somehow. > > I still don't follow. some_thing_you_want_to_know_the_address_after_let_us_call_it_A; empty_asm_that_we_can_take_the_address_of_known_as_B; You have to make sure the compiler keeps A and B together, does not insert anything between them, does put them in the assembler output in the same fragment, etc. > If you could give a built in that put a label at the address of the trap > instruction that could be used later by inline asm then that could work > too: > > __builtin_labeled_trap("1:"); > asm (" .section __bug_table,\"aw\" \n\t" > "2: .4byte 1b - 2b \n\t" > " .previous"); How could a compiler do anything like that?! Segher