jprotze wrote: > The nested `target_data_op` can actually cause issues, as demonstrated with > our dumping tool: > > ``` > [0][callback_target_data_op_emi] endpoint = begin | target_task_data->value = > 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = > 888000003 (0x76723fc13798) | optype = memset | src_addr = (nil) | > src_device_num = -1 | dest_addr = 0x767234200000 | dest_device_num = 0 | > bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a > [0][callback_target_data_op_emi] endpoint = begin | target_task_data->value = > 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = > 888000004 (0x76723fc13798) | optype = transfer_to_device | src_addr = > 0x5fc5e51cb5f0 | src_device_num = -1 | dest_addr = 0x767234200000 | > dest_device_num = 0 | bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a > [0][callback_target_data_op_emi] endpoint = end | target_task_data->value = > 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = > 888000004 (0x76723fc13798) | optype = transfer_to_device | src_addr = > 0x5fc5e51cb5f0 | src_device_num = -1 | dest_addr = 0x767234200000 | > dest_device_num = 0 | bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a > [0][callback_target_data_op_emi] endpoint = end | target_task_data->value = > 777777777 ((nil)) | target_data->value = 0 (0x76723fc137a0) | *host_op_id = > 888000004 (0x76723fc13798) | optype = memset | src_addr = (nil) | > src_device_num = -1 | dest_addr = 0x767234200000 | dest_device_num = 0 | > bytes = 400 | codeptr_ra = 0x5fc5ca1ca25a > ``` > > The `host_op_id` when exiting `memset` corresponds to the value set by > `transfer_to_device` (`888[...]4`). This is incorrect and should be > `888[...]3` instead. By quickly glancing at the code, I'm not sure if the > runtime is able to handle such nested events at all currently, almost > certainly not for `HostOpId`.
Depending on the tool workflow, this wrong information might lead to memory leaks and double-frees. To handle this nested sequence of data-op callbacks, we would need a stack for the `HostOpId`. If we know, that it will never be more than two levels, we could cheat and make it a static-sized thread-local stack. https://github.com/llvm/llvm-project/pull/194168 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
