On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote: > On 02-Feb-18 7:28 PM, Yongseok Koh wrote: > > On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote: > > > On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote: > > > > On 21-Dec-17 9:38 PM, Walker, Benjamin wrote: > > > > > SPDK will need some way to register for a notification when pages are > > > > > allocated > > > > > or freed. For storage, the number of requests per second is (relative > > > > > to > > > > > networking) fairly small (hundreds of thousands per second in a > > > > > traditional > > > > > block storage stack, or a few million per second with SPDK). Given > > > > > that, we > > > > > can > > > > > afford to do a dynamic lookup from va to pa/iova on each request in > > > > > order to > > > > > greatly simplify our APIs (users can just pass pointers around > > > > > instead of > > > > > mbufs). DPDK has a way to lookup the pa from a given va, but it does > > > > > so by > > > > > scanning /proc/self/pagemap and is very slow. SPDK instead handles > > > > > this by > > > > > implementing a lookup table of va to pa/iova which we populate by > > > > > scanning > > > > > through the DPDK memory segments at start up, so the lookup in our > > > > > table is > > > > > sufficiently fast for storage use cases. If the list of memory > > > > > segments > > > > > changes, > > > > > we need to know about it in order to update our map. > > > > > > > > Hi Benjamin, > > > > > > > > So, in other words, we need callbacks on alloa/free. What information > > > > would SPDK need when receiving this notification? Since we can't really > > > > know in advance how many pages we allocate (it may be one, it may be a > > > > thousand) and they no longer are guaranteed to be contiguous, would a > > > > per-page callback be OK? Alternatively, we could have one callback per > > > > operation, but only provide VA and size of allocated memory, while > > > > leaving everything else to the user. I do add a virt2memseg() function > > > > which would allow you to look up segment physical addresses easier, so > > > > you won't have to manually scan memseg lists to get IOVA for a given VA. > > > > > > > > Thanks for your feedback and suggestions! > > > > > > Yes - callbacks on alloc/free would be perfect. Ideally for us we want one > > > callback per virtual memory region allocated, plus a function we can call > > > to > > > find the physical addresses/page break points on that virtual region. The > > > function that finds the physical addresses does not have to be efficient > > > - we'll > > > just call that once when the new region is allocated and store the > > > results in a > > > fast lookup table. One call per virtual region is better for us than one > > > call > > > per physical page because we're actually keeping multiple different types > > > of > > > memory address translation tables in SPDK. One translates from va to > > > pa/iova, so > > > for this one we need to break this up into physical pages and it doesn't > > > matter > > > if you do one call per virtual region or one per physical page. However > > > another > > > one translates from va to RDMA lkey, so it is much more efficient if we > > > can > > > register large virtual regions in a single call. > > > > Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to > > look up LKEY per each packet DMA. Let me briefly explain about this for your > > understanding. For security reason, we don't allow application initiates a > > DMA > > transaction with unknown random physical addresses. Instead, va-to-pa > > mapping > > (we call it Memory Region) should be pre-registered and LKEY is the index > > of the > > translation entry registered in device. With the current static memory > > model, it > > is easy to manage because v-p mapping is unchanged over time. But if it > > becomes > > dynamic, MLX PMD should get notified with the event to register/un-regsiter > > Memory Region. > > > > For MLX PMD, it is also enough to get one notification per allocation/free > > of a > > virutal memory region. It shouldn't necessarily be a per-page call like > > Benjamin > > mentioned because PA of region doesn't need to be contiguous for > > registration. > > But it doesn't need to know about physical address of the region (I'm not > > saying > > it is unnecessary, but just FYI :-). > > > > Thanks, > > Yongseok > > > > Thanks for your feedback, good to hear we're on the right track. I already > have a prototype implementation of this working, due for v1 submission :)
Hi Anatoly, Good to know. Do you see some performances impact with this series? Thanks, -- Nélio Laranjeiro 6WIND