On 10/19/2015 05:28 AM, Dmitry Vyukov wrote:
> On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara <[email protected]> 
> wrote:
>> Hi Dmitry,
>>
>> On 19/10/15 10:05, Dmitry Vyukov wrote:
>>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin <[email protected]> wrote:
>>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>>>> Hello,
>>>>>
>>>>> I am trying to run a program in lkvm sandbox so that it communicates
>>>>> with a program on host. I run lkvm as:
>>>>>
>>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>>>
>>>>> /my_prog then connects to a program on host over a tcp socket.
>>>>> I see that host receives some data, sends some data back, but then
>>>>> my_prog hangs on network read.
>>>>>
>>>>> To localize this I wrote 2 programs (attached). ping is run on host
>>>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>>>> connection, but after some iterations both hang on read.
>>>>>
>>>>> Networking code in Go runtime is there for more than 3 years, widely
>>>>> used in production and does not have any known bugs. However, it uses
>>>>> epoll edge-triggered readiness notifications that known to be tricky.
>>>>> Is it possible that lkvm contains some networking bug? Can it be
>>>>> related to the data races in lkvm I reported earlier today?
>>
>> Just to let you know:
>> I think we have seen networking issues in the past - root over NFS had
>> issues IIRC. Will spent some time on debugging this and it looked like a
>> race condition in kvmtool's virtio implementation. I think pinning
>> kvmtool's virtio threads to one host core made this go away. However
>> although he tried hard (even by Will's standards!) he couldn't find a
>> the real root cause or a fix at the time he looked at it and we found
>> other ways to work around the issues (using virtio-blk or initrd's).
>>
>> So it's quite possible that there are issues. I haven't had time yet to
>> look at your sanitizer reports, but it looks like a promising approach
>> to find the root cause.
> 
> 
> Thanks, Andre!
> 
> ping/pong does not hang within at least 5 minutes when I run lkvm
> under taskset 1.
> 
> And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
> can point you to the bug within a minute, so you just need to say
> "aha! it is here". Or maybe not. There are no guarantees. But if you
> already spent significant time on this, then checking the reports
> definitely looks like a good idea.

Okay, that's good to know.

I have a few busy days, but I'll definitely try to clear up these reports
as they seem to be pointing to real issues.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to