On 01/11/22 08:00, Laszlo Ersek wrote:
> On 01/10/22 16:52, Richard W.M. Jones wrote:
>>
>> For the raw format local disk to local disk conversion, it's possible
>> to regain most of the performance by adding
>> --request-size=$(( 16 * 1024 * 1024 )) to the nbdcopy command.  The
>> patch below is not suitable for going upstream but it can be used for
>> testing:
>>
>> diff --git a/v2v/v2v.ml b/v2v/v2v.ml
>> index 47e6e937..ece3b7d9 100644
>> --- a/v2v/v2v.ml
>> +++ b/v2v/v2v.ml
>> @@ -613,6 +613,7 @@ and nbdcopy output_alloc input_uri output_uri =
>>    let cmd = ref [] in
>>    List.push_back_list cmd [ "nbdcopy"; input_uri; output_uri ];
>>    List.push_back cmd "--flush";
>> +  List.push_back cmd "--request-size=16777216";
>>    (*List.push_back cmd "--verbose";*)
>>    if not (quiet ()) then List.push_back cmd "--progress";
>>    if output_alloc = Types.Preallocated then List.push_back cmd 
>> "--allocated";
>>
>> The problem is of course this is a pessimisation for other
>> conversions.  It's known to make at least qcow2 to qcow2, and all VDDK
>> conversions worse.  So we'd have to make it conditional on doing a raw
>> format local conversion, which is a pretty ugly hack.  Even worse, the
>> exact size (16M) varies for me when I test this on different machines
>> and HDDs vs SSDs.  On my very fast AMD machine with an SSD, the
>> nbdcopy default request size (256K) is fastest and larger sizes are a
>> very slightly slower.
>>
>> I can imagine an "adaptive nbdcopy" which adjusts these parameters
>> while copying in order to find the best performance.  A little bit
>> hard to implement ...
>>
>> I'm also still wondering exactly why a larger request size is better
>> in this case.  You can easily reproduce the effect using the attached
>> test script and adjusting --request-size.  You'll need to build the
>> standard test guest, see part 1.
> 
> (The following thought occurred to me last evening.)
> 
> In modular v2v, we use multi-threaded nbdkit instances, and
> multi-threaded nbdcopy instances. (IIUC.) I think: that should result in
> quite a bit of thrashing, on both source and destination disks, no? That
> should be especially visible on HDDs, but perhaps also on SSDs
> (dependent on request size as you mention above).
> 
> The worst is likely when both nbdcopy processes operate on the same
> physical HDD (i.e., spinning rust).
> 
> qemu-img is single-threaded,

hmmmm, not necessarily; according to the manual, "qemu-img convert" uses
(by default) 8 co-routines. There's also the -W flag ("out of order
writes"), which I don't know if the original virt-v2v used.

Laszlo

> so even if reads from and writes to the
> same physical hard disk, it kind of generates two "parallel" request
> streams, which both the disk and the kernel's IO scheduler could cope
> with more easily. According to the nbdcopy manual, the default thread
> count is "number of processor cores available", the "sliding window of
> requests" with a high thread count is likely undistinguishable from real
> random access.
> 
> Also I (vaguely?) gather that nbdcopy bypasses the page cache (or does
> it only sync automatically at the end? I don't remember). If the page
> cache is avoided, then the page cache has no chance to mitigate the
> thrashing, especially on HDDs -- but even on SSDs, if the drive's
> internal cache is not large enough (considering the individual request
> size and the number of random requests flying in parallel), the
> degradation should be visible.
> 
> Can you tweak (i.e., lower) the thread count of both nbdcopy processes;
> let's say to "1", for starters?
> 
> Thanks!
> Laszlo
> 

_______________________________________________
Libguestfs mailing list
[email protected]
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to