> On Nov 16, 2025, at 11:32 PM, Jason Wang <[email protected]> wrote:
> 
> On Fri, Nov 14, 2025 at 10:53 PM Jon Kohler <[email protected]> wrote:
>> 
>> 
>> 
>>> On Nov 12, 2025, at 8:09 PM, Jason Wang <[email protected]> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> On Thu, Nov 13, 2025 at 8:14 AM Jon Kohler <[email protected]> wrote:
>>>> 
>>>> vhost_get_user and vhost_put_user leverage __get_user and __put_user,
>>>> respectively, which were both added in 2016 by commit 6b1e6cc7855b
>>>> ("vhost: new device IOTLB API").
>>> 
>>> It has been used even before this commit.
>> 
>> Ah, thanks for the pointer. I’d have to go dig to find its genesis, but
>> its more to say, this existed prior to the LFENCE commit.
>> 
>>> 
>>>> In a heavy UDP transmit workload on a
>>>> vhost-net backed tap device, these functions showed up as ~11.6% of
>>>> samples in a flamegraph of the underlying vhost worker thread.
>>>> 
>>>> Quoting Linus from [1]:
>>>>   Anyway, every single __get_user() call I looked at looked like
>>>>   historical garbage. [...] End result: I get the feeling that we
>>>>   should just do a global search-and-replace of the __get_user/
>>>>   __put_user users, replace them with plain get_user/put_user instead,
>>>>   and then fix up any fallout (eg the coco code).
>>>> 
>>>> Switch to plain get_user/put_user in vhost, which results in a slight
>>>> throughput speedup. get_user now about ~8.4% of samples in flamegraph.
>>>> 
>>>> Basic iperf3 test on a Intel 5416S CPU with Ubuntu 25.10 guest:
>>>> TX: taskset -c 2 iperf3 -c <rx_ip> -t 60 -p 5200 -b 0 -u -i 5
>>>> RX: taskset -c 2 iperf3 -s -p 5200 -D
>>>> Before: 6.08 Gbits/sec
>>>> After:  6.32 Gbits/sec
>>> 
>>> I wonder if we need to test on archs like ARM.
>> 
>> Are you thinking from a performance perspective? Or a correctness one?
> 
> Performance, I think the patch is correct.
> 
> Thanks
> 

Ok gotcha. If anyone has an ARM system stuffed in their
front pocket and can give this a poke, I’d appreciate it, as
I don’t have ready access to one personally.

That said, I think this might end up in “well, it is what it is”
territory as Linus was alluding to, i.e. if performance dips on
ARM for vhost, then thats a compelling point to optimize whatever
ends up being the culprit for get/put user?

Said another way, would ARM perf testing (or any other arch) be a
blocker to taking this change?

Thanks - Jon

Reply via email to