Re: [pve-devel] [PATCH manager v4 3/6] added Config for Shared Filesystem Directories

Thomas Lamprecht Thu, 04 May 2023 03:22:35 -0700

Am 04/05/2023 um 10:57 schrieb Dominik Csapak:
> On 5/4/23 10:42, Thomas Lamprecht wrote:
>> Am 04/05/2023 um 10:31 schrieb Dominik Csapak:
>>> On 5/4/23 10:13, Thomas Lamprecht wrote:
>>>> Am 03/05/2023 um 13:26 schrieb Dominik Csapak:
>>>>> just a short comment since this series overlaps a bit with my
>>>>> cluster resource mapping series (i plan on sending a v4 soon).
>>>>>
>>>>> i'd prefer to have the configuration endpoints for mapping bundled in a 
>>>>> subdirectory
>>>>> so instead of /nodes/<node>/dirs/ i'd put it in 
>>>>> /nodes/<node>/mapping/dirs/
>>>>> (or /nodes/<node>/map/dirs )
>>>>>
>>>>> @thomas, @fabian, any other input on that?
>>>>>
>>>>
>>>> huh? aren't mappings per definition cluster wide i.e. 
>>>> /cluster/resource-map/<mapping-id>
>>>> than then allows to add/update the mapping of a resource on a specific 
>>>> node?
>>>> A node specific path makes no sense to me, at max it would if 
>>>> adding/removing a mapping
>>>> is completely decoupled from adding/removing/updaing entries to it – but 
>>>> that seems
>>>> convoluted from an usage POV and easy to get out of sync with the actual 
>>>> mapping list.
>>>
>>> in markus series the mapping are only ever per node, so each node has it's
>>> own dir mapping
>>
>> Every resource maping is always per node, so that's not really changing 
>> anything.
>>
> 
> i meant there is no cluster aspect in the current version of markus series at 
> all


only if you lock down the vm hard to be kept at the node its mapping was 
configured.

> 
>> Rather what about migrations? Would be simpler from migration and ACL POV to 
>> have
>> it cluster wide,
> 
> sure, but how we get the info during migration is just an implementation 
> detail and
> for that shouldn't matter where the configs/api endpoints are


no it really isn't an implementation detail how ACLs govern access to mappings, 
which
in turn dictactes where the ACL object paths should be, which then again ahs 
implications
for where the API endpoints and configs should be – node level isn't the one.

>>>
>>> in my series, the actual config was cluster-wide, but the api endpoint to 
>>> configure
>>> them were sitting in the node path (e.g. /node/<nodes>/hardware-map/pci/*)
>>  > Please no.
> 
> was that way since the very first version, would have been nice to get that 
> feedback
> earlier
> 

IIRC I always talked about this being on cluster level, especially w.r.t. to 
making it
more general resource mapping, maybe I didn't looked to closely and was thrown 
off
by the series name, i.e., "add *cluster-wide* hardware device mapping" and on 
the ACL
path discussion (which IIRC never had any node-specific param in there, and 
thus only
work if it's either a clusterwide config or if ID uniquness is guaranteed by 
the pmxcfs
like we do for VMIDs)

Anyhow, sorry, I'll try to check for such things more closely earlier in the 
future.

>>>
>>> we *could* put them into the cluster path api, but we'd need to send a node 
>>> parameter
>>> along and forward it there anyway, so that wouldn't really make a difference
>>
>> no need for that, see above.
> 
> there is a need for the node parameter, because you always need to know for 
> which node the
> mapping is anyway ;) or did you mean the 'forward' part?

yes, I certainly meant the forward part – otherwise my "Every resource mapping 
is always
per node" sentence and mentioned the node selector above would be really odd. A
node:local-id map always needs to exist, I figured that's the core thing this 
series wants
to solve, but as it's not a hard requirement to forward every request (and a 
huge PITA
to get in sync if done that way, or at least would reaquire using a 
cfs_domain_lock),
so no (hard) need for proxying.


>>>
>>> for reading the mappings, that could be done there, but in my series in the 
>>> gui at least,
>>> i have to make a call to each node to get the current state of the mapping
>>> (e.g. if the pci device is still there)
>>
>> For now not ideal but ok, in the future I'd rather go in the direction of 
>> broadcasting
>> some types of HW resources via pmxcfs KV and then this isn't an issue 
>> anymore.
> 
> yeah can be done, but the question is if we want to broadcast the whole 
> pci/usb list
> from all nodes? (i don't believe that scales well?)

how is one or two dozen of PCI IDs + some meta info, broadcasted normally once 
per boot
(or statd start) per node a scaling problem? Especially compared to the amount 
of data we
broadcast otherwise it's rather nelligible.

> 
>>
>>>
>>> if a mapping exists (globally) is not interesting most of the time, we only 
>>> need to know
>>> if it exists at a specific node
>>
>> that's looking at it backwards, the user and ACL only care for global 
>> mapings, how
>> the code implements that is then, well an implementation detail.
> 
> ACL yes, the user must configure the mapping on a vm (which sits on a 
> specific node)
> and there the mapping must exist on that node
> 

Where they also don't care for local-nodeness, they just select a mapping (no 
need to hard
error on the mapping not being usable on the current node, VM can be migrated 
before the
start – albeit it would certainly improve UX if it was visually hinted); 
otherwise they
could've just used a local ID directly.

>>
>>>
>>> also, after seeing markus' patches, i also leaned more in the direction of 
>>> splitting
>>> my global mapping config into a config per type+node (so node1/usb.conf, 
>>> node1/pci.conf,
>>
>> no, please no forest^W jungle of config trees :/
>>
>> A /etc/pve/resource-map/<type>.conf must be enough, even a 
>> /etc/pve/resource-map.conf
>> should be tbh., but I could imagine that splitting per resource type makes 
>> some (schema)
>> things a bit easier and reduce some bug potential, so not to hard feelings 
>> on having one
>> cluster-wide per type; but really not more.
>>
> 
> sure a single config is ok for me (only per type would be weird, since 
> resuing the

Not sure if I'd find that flat out weird, depends on the whole design and 
planned future 
direction and if it's OK that the ID used for an PCI mapping can never be 
reused for some
directory or other relatively unrelated mapping. FWIW, <type> here could also 
be the higher
level resource type (group), i.e., hardware.cfg for PCI/USB, one for local file 
system stuff
like directories.

FWIW, section config have a disadvantage if very different config types are 
mixed, as the
schema is then also shown as sum of all possible types in API/man page (even 
storage.cfg,
while handling similar things, is already very confusing); possibly solvable on 
api/manpage
doc generation though; just mentioning it as I had that in my mind already a 
few times
and as we should the improve sometime, and expanding use for wildly different 
types it
might be better to do it sooner than later (but not directly related to this).

> sectionconfig would only ever have single type and we'd have to encode the
> nodename somehow)

Ideally we wouldn't have to encode the nodename, as doing so would make 
especially using
a single config weird (i.e., it would make the <id>+<node> the sectionID from a 
config POV
but not from a usage POV, which would then be weird for a e.g., "pci: 
foo:nodeA" and a
"usb: foo:nodeB".

It should be tried to have a single config entry per ID, i.e. very handwavy:

pci: id
    comment foo
    allow-multi-function 1
    other-map-wide-e.g-enforcement-setting-a 1
    ...
    map nodeA:3e:00.0,nodeB:0a:00.0

Or better, with the long contemplated array support (which not necesarrily 
you'd need to
spearhead)


pci: id
    ...
    map node=A,id=3e:00.0
    map node=B,id=3e:01.0

I read (but not in extended detail) "[PATCH common v3 2/3] add PVE/HardwareMap" 
and above
is probably not expressive enough on it's own, but otoh, the formats used in 
the patch
seem a bit to flexible (i.e., makes not much sense to allow mdev on one map 
entry of one
node but not on the one from another node – in practice that can hardly work 
for VMs I
guess and would limit any future live-migration caps?). Also why's the ID that 
split up
there?

I'd rather try to keep the map to the minimal ID info and some general 
restrictions (e.g.,
if some special use is allowed), but other details that only apply on using the 
mapped
device in a guest would still be handled there (as long as the mapping allows 
it).

Sorry for the a bit higher level and slightly hand-wavy description, if you 
list specific
problems and required things I can try to formalize the config format I'd 
envision to then
be also useful for other resource mappings a bit more.

And just to bring it at the "table", in theory we _maybe_ could split naming a 
node device
and the cluster-wide mapping, i.e., the mapping would be, well, cluster-wide 
and naming
devices plus some extra "how can/should it be used" would then be node-specific 
(possible
not even in /etc/pve) – not saying I'd prefer that, but that's something that 
could be
explored before finalizing on a design; I don't want to rush this as we have to 
support
something big and certainly poplular as this basically forever, and while 
config migrations
can be done somehow it's basically always a pita.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH manager v4 3/6] added Config for Shared Filesystem Directories

Reply via email to