Am 04/05/2023 um 10:57 schrieb Dominik Csapak: > On 5/4/23 10:42, Thomas Lamprecht wrote: >> Am 04/05/2023 um 10:31 schrieb Dominik Csapak: >>> On 5/4/23 10:13, Thomas Lamprecht wrote: >>>> Am 03/05/2023 um 13:26 schrieb Dominik Csapak: >>>>> just a short comment since this series overlaps a bit with my >>>>> cluster resource mapping series (i plan on sending a v4 soon). >>>>> >>>>> i'd prefer to have the configuration endpoints for mapping bundled in a >>>>> subdirectory >>>>> so instead of /nodes/<node>/dirs/ i'd put it in >>>>> /nodes/<node>/mapping/dirs/ >>>>> (or /nodes/<node>/map/dirs ) >>>>> >>>>> @thomas, @fabian, any other input on that? >>>>> >>>> >>>> huh? aren't mappings per definition cluster wide i.e. >>>> /cluster/resource-map/<mapping-id> >>>> than then allows to add/update the mapping of a resource on a specific >>>> node? >>>> A node specific path makes no sense to me, at max it would if >>>> adding/removing a mapping >>>> is completely decoupled from adding/removing/updaing entries to it – but >>>> that seems >>>> convoluted from an usage POV and easy to get out of sync with the actual >>>> mapping list. >>> >>> in markus series the mapping are only ever per node, so each node has it's >>> own dir mapping >> >> Every resource maping is always per node, so that's not really changing >> anything. >> > > i meant there is no cluster aspect in the current version of markus series at > all
only if you lock down the vm hard to be kept at the node its mapping was configured. > >> Rather what about migrations? Would be simpler from migration and ACL POV to >> have >> it cluster wide, > > sure, but how we get the info during migration is just an implementation > detail and > for that shouldn't matter where the configs/api endpoints are no it really isn't an implementation detail how ACLs govern access to mappings, which in turn dictactes where the ACL object paths should be, which then again ahs implications for where the API endpoints and configs should be – node level isn't the one. >>> >>> in my series, the actual config was cluster-wide, but the api endpoint to >>> configure >>> them were sitting in the node path (e.g. /node/<nodes>/hardware-map/pci/*) >> > Please no. > > was that way since the very first version, would have been nice to get that > feedback > earlier > IIRC I always talked about this being on cluster level, especially w.r.t. to making it more general resource mapping, maybe I didn't looked to closely and was thrown off by the series name, i.e., "add *cluster-wide* hardware device mapping" and on the ACL path discussion (which IIRC never had any node-specific param in there, and thus only work if it's either a clusterwide config or if ID uniquness is guaranteed by the pmxcfs like we do for VMIDs) Anyhow, sorry, I'll try to check for such things more closely earlier in the future. >>> >>> we *could* put them into the cluster path api, but we'd need to send a node >>> parameter >>> along and forward it there anyway, so that wouldn't really make a difference >> >> no need for that, see above. > > there is a need for the node parameter, because you always need to know for > which node the > mapping is anyway ;) or did you mean the 'forward' part? yes, I certainly meant the forward part – otherwise my "Every resource mapping is always per node" sentence and mentioned the node selector above would be really odd. A node:local-id map always needs to exist, I figured that's the core thing this series wants to solve, but as it's not a hard requirement to forward every request (and a huge PITA to get in sync if done that way, or at least would reaquire using a cfs_domain_lock), so no (hard) need for proxying. >>> >>> for reading the mappings, that could be done there, but in my series in the >>> gui at least, >>> i have to make a call to each node to get the current state of the mapping >>> (e.g. if the pci device is still there) >> >> For now not ideal but ok, in the future I'd rather go in the direction of >> broadcasting >> some types of HW resources via pmxcfs KV and then this isn't an issue >> anymore. > > yeah can be done, but the question is if we want to broadcast the whole > pci/usb list > from all nodes? (i don't believe that scales well?) how is one or two dozen of PCI IDs + some meta info, broadcasted normally once per boot (or statd start) per node a scaling problem? Especially compared to the amount of data we broadcast otherwise it's rather nelligible. > >> >>> >>> if a mapping exists (globally) is not interesting most of the time, we only >>> need to know >>> if it exists at a specific node >> >> that's looking at it backwards, the user and ACL only care for global >> mapings, how >> the code implements that is then, well an implementation detail. > > ACL yes, the user must configure the mapping on a vm (which sits on a > specific node) > and there the mapping must exist on that node > Where they also don't care for local-nodeness, they just select a mapping (no need to hard error on the mapping not being usable on the current node, VM can be migrated before the start – albeit it would certainly improve UX if it was visually hinted); otherwise they could've just used a local ID directly. >> >>> >>> also, after seeing markus' patches, i also leaned more in the direction of >>> splitting >>> my global mapping config into a config per type+node (so node1/usb.conf, >>> node1/pci.conf, >> >> no, please no forest^W jungle of config trees :/ >> >> A /etc/pve/resource-map/<type>.conf must be enough, even a >> /etc/pve/resource-map.conf >> should be tbh., but I could imagine that splitting per resource type makes >> some (schema) >> things a bit easier and reduce some bug potential, so not to hard feelings >> on having one >> cluster-wide per type; but really not more. >> > > sure a single config is ok for me (only per type would be weird, since > resuing the Not sure if I'd find that flat out weird, depends on the whole design and planned future direction and if it's OK that the ID used for an PCI mapping can never be reused for some directory or other relatively unrelated mapping. FWIW, <type> here could also be the higher level resource type (group), i.e., hardware.cfg for PCI/USB, one for local file system stuff like directories. FWIW, section config have a disadvantage if very different config types are mixed, as the schema is then also shown as sum of all possible types in API/man page (even storage.cfg, while handling similar things, is already very confusing); possibly solvable on api/manpage doc generation though; just mentioning it as I had that in my mind already a few times and as we should the improve sometime, and expanding use for wildly different types it might be better to do it sooner than later (but not directly related to this). > sectionconfig would only ever have single type and we'd have to encode the > nodename somehow) Ideally we wouldn't have to encode the nodename, as doing so would make especially using a single config weird (i.e., it would make the <id>+<node> the sectionID from a config POV but not from a usage POV, which would then be weird for a e.g., "pci: foo:nodeA" and a "usb: foo:nodeB". It should be tried to have a single config entry per ID, i.e. very handwavy: pci: id comment foo allow-multi-function 1 other-map-wide-e.g-enforcement-setting-a 1 ... map nodeA:3e:00.0,nodeB:0a:00.0 Or better, with the long contemplated array support (which not necesarrily you'd need to spearhead) pci: id ... map node=A,id=3e:00.0 map node=B,id=3e:01.0 I read (but not in extended detail) "[PATCH common v3 2/3] add PVE/HardwareMap" and above is probably not expressive enough on it's own, but otoh, the formats used in the patch seem a bit to flexible (i.e., makes not much sense to allow mdev on one map entry of one node but not on the one from another node – in practice that can hardly work for VMs I guess and would limit any future live-migration caps?). Also why's the ID that split up there? I'd rather try to keep the map to the minimal ID info and some general restrictions (e.g., if some special use is allowed), but other details that only apply on using the mapped device in a guest would still be handled there (as long as the mapping allows it). Sorry for the a bit higher level and slightly hand-wavy description, if you list specific problems and required things I can try to formalize the config format I'd envision to then be also useful for other resource mappings a bit more. And just to bring it at the "table", in theory we _maybe_ could split naming a node device and the cluster-wide mapping, i.e., the mapping would be, well, cluster-wide and naming devices plus some extra "how can/should it be used" would then be node-specific (possible not even in /etc/pve) – not saying I'd prefer that, but that's something that could be explored before finalizing on a design; I don't want to rush this as we have to support something big and certainly poplular as this basically forever, and while config migrations can be done somehow it's basically always a pita. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel