On 5/23/26 17:00, Demi Marie Obenour wrote:
> On 5/8/26 06:11, Stefano Garzarella wrote:
>> On Fri, May 08, 2026 at 05:58:06AM -0400, Michael S. Tsirkin wrote:
>>> On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
>>>> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
>>>>> On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
>>>>>> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
>>>>>>> On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>> For now, we're already doing something:
>>>>>>>> merging the skuffs if they don't have EOM set.
>>>>>>>
>>>>>>>
>>>>>>> Right that's good. You could go further and merge with EOM too
>>>>>>> if you stick the info about message boundaries somewhere else.
>>>>>>
>>>>>> This adds a lot of complexity IMO, but we can try.
>>>>>>
>>>>>> Do you have something in mind?
>>>>>
>>>>> BER is clearly overkill but here's a POC that claude made for me,
>>>>> just to give u an idea. It's clearly has a ton of issues,
>>>>> for example I dislike how GFP_ATOMIC is handled.
>>>>
>>>> Okay, I somewhat understand, but clearly this isn't net material
>>>
>>> I doubt we have many other options given reverting the regression was
>>> ruled out.
>>
>> As Eric pointed out, we can't revert it.
> 
> Could there be an option to disable the mitigation in guests, for the
> situation where the host is trusted?  There are VMMs that implement
> AF_VSOCK in userspace with a backing AF_UNIX socket.
> 
>>>> so for now
>>>> I think the best thing to do is to merge the fixup I sent (or something
>>>> similar):
>>>> https://lore.kernel.org/netdev/[email protected]/
>>>
>>> I reviewed that one, problem is it's a spec violation/change that we'll
>>> have to support forever.
>>
>> I have a few points to make on this, but let's discuss them there.
>>
>>>
>>>> This is a major change that should be merged with more caution.
>>>> Could this have too much of an impact on performance?
>>>>
>>>> Thanks,
>>>> Stefano
>>>
>>> It's really a POC, real patch is left as an excersise for the reader 
>>> :).
>>
>> eheh, I see, but honestly, this overcomplication scares me. I'll try to 
>> think it over.
>>
>>> The correct approach IMHO is to only start using this
>>> when we wasted a lot of memory on small packets.
>>>
>>> For example, if sum(truesize) >= buf size.
>>>
>>> then we'll not see a perf impact unless it's already pathological.
>>
>> Agree on this, which is similar to what I'm doing in that patch.  
>> Reducing the advertised buf_alloc only in pathological cases (e.g.  
>> overhead > buf_alloc).
> 
> This isn't enough to prevent data loss due to race conditions.
> 
> I'm CCing the virtio-comment list and a few others.
> 
> Right now, any application that needs to send massive amount of
> data over a vsock is simply broken.  This isn't just theoretical.
> It's causing real-world problems for users of Waypipe.
> Waypipe forwards Wayland protocol messages over AF_VSOCK,
> so it can send a large amount of traffic over the socket.
> See https://gitlab.freedesktop.org/mstoceckl/waypipe/work_items/165.
> 
> If one is willing to mutate the ring buffer in-place, or to maintain
> an auxiliary counter, it's possible to store all messages with bounded
> (in practice) overhead.  Specifically:
> 
> - If the first byte of a block of data is nonzero, it's a
>   variable-length length.  1 byte for messages less than 128 bytes.
> 
> - If the first byte of a block of data is zero, the subsequent bytes
>   are a variable-length counter that stores the number of consecutive
>   zero-byte messages.
> 
> That adds a lot of complexity, which is very unfortunate for something
> that needs to be backported to stable kernels.  I also suspect it
> requires all access to the ring buffer to take a lock rather than
> being lock-free.  But it's the only approach that I can think of that
> can work with the current spec.
> 
> Could there at least be a normative note stating that drivers and
> devices should treat each message as consuming 1024 bytes + the size
> of the message itself, and warning that anything that doesn't is
> going to be broken in practice?
> 
> I'm CCing Val Packet (of Invisible Things Lab) and Alyssa Ross
> (of Spectrum) because both of them are working on systems that rely
> critically on vsock.

Update: I see that patches have been upstreamed (with CC: stable) that
reset the connection instead of data loss.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to