On Tue, Nov 17, 2020 at 07:04:59PM +0000, Kalra, Ashish wrote: > Hello Konrad, > > Actually I didn’t get that, do you mean you are taking 1G and <=4G cases out > of the patch and only going to apply the >4G case as part of the patch ?
That was the thought, but now I am wondering how TDX is going to work with this. That is the __weak won't work on distro kernel that has to run on both AMD and Intel. Hmm. Let me brush off the late-SWIOTLB patches that internally we developed some time ago. > > Thanks, > Ashish > > > On Nov 17, 2020, at 11:38 AM, Kalra, Ashish <ashish.ka...@amd.com> wrote: > > > > Hello Konrad, > > > >> On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote: > >> .snip.. > >>>>>> Lets break this down: > >>>>>> > >>>>>> How does the performance improve for one single device if you increase > >>>>>> the SWIOTLB? > >>>>>> Is there a specific device/driver that you can talk about that improve > >>>>>> with this patch? > >>>>>> > >>>>>> > >>>>> > >>>>> Yes, these are mainly for multi-queue devices such as NICs or even > >>>>> multi-queue virtio. > >>>>> > >>>>> This basically improves performance with concurrent DMA, hence, > >>>>> basically multi-queue devices. > >>>> > >>>> OK, and for _1GB_ guest - what are the "internal teams/external > >>>> customers" amount > >>>> of CPUs they use? Please lets use real use-cases. > >>> > >>>>> I am sure you will understand we cannot share any external customer > >>>>> data as all that customer information is proprietary. > >>>>> > >>>>> In similar situation if you have to share Oracle data, you will > >>>>> surely have the same concerns and i don't think you will be able > >>>>> to share any such information externally, i.e., outside Oracle. > >>>>> > >>>> I am asking for a simple query - what amount of CPUs does a 1GB > >>>> guest have? The reason for this should be fairly obvious - if > >>>> it is a 1vCPU, then there is no multi-queue and the existing > >>>> SWIOTLB pool size as it is OK. > >>>> > >>>> If however there are say 2 and multiqueue is enabled, that > >>>> gives me an idea of how many you use and I can find out what > >>>> the maximum pool size usage of virtio there is with that configuration. > >>> > >>> Again we cannot share any customer data. > >>> > >>> Also i don't think there can be a definitive answer to how many vCPUs a > >>> 1GB guest will have, it will depend on what kind of configuration we are > >>> testing. > >>> > >>> For example, i usually setup 4-16 vCPUs for as low as 512M configured > >>> gueest memory. > >> > >> Sure, but you are not the normal user. > >> > >> That is I don't like that for 1GB guests your patch ends up doubling the > >> SWIOTLB memory pool. That seems to me we are trying to solve a problem > >> that normal users will not hit. That is why I want 'here is the customer > >> bug'. > >> > >> Here is what I am going to do - I will take out the 1GB and 4GB case out of > >> your patch and call it a day. If there are customers who start reporting > >> issues > >> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the > >> customer if he or she would like to be recognized on upstream bugs). > >> > > > > Ok. > > > >> And in the meantime I am going to look about adding .. > >>> > >>> I have been also testing with 16 vCPUs configuration for 512M-1G guest > >>> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC > >>> device environment. > >> > >> .. late SWIOTLB expansion to stich the DMA pools together as both > >> Mellanox and VirtIO are not 32-bit DMA bound. > >> > >>> > >>> So we might be having less configured guest memory, but we still might > >>> be using that configuration with I/O intensive workloads. > >>> > > > > I am going to submit v4 of my current patch-set which uses max() instead > > of clamp() and also replaces constants defined in this patch with the > > pre-defined ones in sizes.h > > > > Thanks, > > Ashish