Jørgen Hansen <jorgen.han...@wdc.com> writes:

> Hi,
>
> While doing some testing using numactl-based interleaving of application 
> memory
> across regular memory and CXL-based memory using QEMU with tcg, I ran into an
> issue similar to what we saw a while back - link to old issue:
> https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=epz3_+cheat8crsk9mou894wbnw_fywamk...@mail.gmail.com/#t.
>
> When running:
>
> numactl --interleave 0,1 ./cachebench …
>
> I hit the following:
>
> numactl --interleave 0,1 ./cachebench --json_test_config 
> ../test_configs/hit_ratio/graph_cache_follower_assocs/config.json
> qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fffc3926dd4

Ok so this will be because we (by design) don't cache TB's for MMIO
regions. But as you say:
>
> Looking at the tb being executed, it looks like it is a single instruction tb,
> so with my _very_ limited understanding of tcg, it shouldn’t be necessary to
> do the IO recompile:
>
> (gdb) up 13
>
> #13 0x0000555555ca13ac in cpu_loop_exec_tb (tb_exit=0x7ffff49ff6d8, 
> last_tb=<synthetic pointer>, pc=<optimized out>, tb=0x7fffc3926cc0 
> <code_gen_buffer+464678035>, cpu=0x5555578c19c0) at 
> ../accel/tcg/cpu-exec.c:904
> 904         tb = cpu_tb_exec(cpu, tb, tb_exit);
> (gdb) print *tb
> $1 = {pc = 0, cs_base = 0, flags = 415285939, cflags = 4278321152, size = 7, 
> icount = 1, tc = {ptr = 0x7fffc3926d80 <code_gen_buffer+464678227>, size = 
> 176}, page_next = {0, 0}, page_addr = {18446744073709551615,
>     18446744073709551615}, jmp_lock = {value = 0}, jmp_reset_offset =
> {65535, 65535}, jmp_insn_offset = {65535, 65535}, jmp_target_addr =
> {0, 0}, jmp_list_head = 140736474540928, jmp_list_next = {0, 0},
> jmp_dest = {0, 0}}

with an icount of 1 we by definition can do io. This is done in the
translator_loop:

        /*
         * Disassemble one instruction.  The translate_insn hook should
         * update db->pc_next and db->is_jmp to indicate what should be
         * done next -- either exiting this loop or locate the start of
         * the next instruction.
         */
        if (db->num_insns == db->max_insns) {
            /* Accept I/O on the last instruction.  */
            set_can_do_io(db, true);
        }

and we see later on:

    tb->icount = db->num_insns;

So I guess we must have gone into the translator loop expecting to
translate more? (side note having int8_t type for the saved bool value
seems weird).

Could you confirm by doing something like:

  if (tb->icount == 1 &&  db->max_insns > 1) {
     fprintf(stderr, "ended up doing one insn (%d, %d)", db->max_insns, 
db->saved_can_do_io);
  }

after we set tb->icount?

> If the application is run entirely out of MMIO memory, things work fine (the
> previous patches related to this is in Jonathans branch), so one thought is 
> that
> it is related to having the code on a mix of regular and CXL memory. Since we
> previously had issues with code crossing page boundaries where only the second
> page is MMIO, I tried out the following change to the fix introduced for that
> issue thinking that reverting to the slow path in the middle of the 
> translation
> might not correctly update can_do_io:
>
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 38c34009a5..db6ea360e0 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -258,6 +258,7 @@ static void *translator_access(CPUArchState *env, 
> DisasContextBase *db,
>              if (unlikely(new_page1 == -1)) {
>                  tb_unlock_pages(tb);
>                  tb_set_page_addr0(tb, -1);
> +                set_can_do_io(db, true);
>                  return NULL;
>              }
>
> With that change, things work fine. Not saying that this is a valid fix for 
> the
> issue, but just trying to provide as much information as possible :)

I can see why this works, my worry is if we address all the early exit
cases here.

>
> Thanks,
> Jorgen

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to