On 26/04/15 10:16AM, David Hildenbrand (Arm) wrote:
> On 4/15/26 00:20, Gregory Price wrote:
> > On Tue, Apr 14, 2026 at 11:57:40AM -0700, Darrick J. Wong wrote:
> >>>
> >>> I very strongly object to making this a prerequisite to merging. This
> >>> is an untested idea that will certainly delay us by at least a couple
> >>> of merge windows when products are shipping now, and the existing approach
> >>> has been in circulation for a long time. It is TOO LATE!!!!!!
> >>
> > ...
> >>
> >> That said, you're clearly pissed at the goalposts changing yet again,
> >> and that's really not fair that we collectively keep moving them.
> >>
> > 
> > This seems a bit more than moving a goalpost.
> > 
> > We're now gating working software, for real working hardware, on a novel,
> > unproven BPF ops structure that controls page table mappings on page table
> > faults which would be used by exactly 1 user : FAMFS.
> 
> Are MM people on board with even letting BPF do that? Honest question,
> if someone has a pointer to how that should work, that would be appreciated.

David, that question is pivotal!! How can we get at least a preliminary
answer sooner rather than later? If the answer is "Hell No", a lot of 
this thread (but not all) becomes moot.

Prior to today this entire discussion has happened in the absence, to my
knowledge, of anybody actually hooking famfs for BPF-based fault handling. 
But today Gregory has shared some code with me that does that. However,
the code doesn't build for me so I guess I'll have to debug that as soon 
as I can. 

Gregory's code, in the current form, still uses two new fuse messages,
GET_FMAP and GET_DAXDEV, but it makes the fmap message format opaque by
removing fmap format structs from the uapi. It also uses two BPF programs.
One BPF program parses and validates the GET_FMAP payload for every file,
and hangs it from a 'void *' in each fuse_inode (just like the current famfs
code). The other BPF program is called during vma faults and reads the 
fuse_inode->'void *' in order to handle faults the same way famfs-fuse does
today, but via BPF instead.

As with all vma "providers", famfs services zillions of faults. But famfs
faults never involve blocking or retrieving from storage, so we don't 
have that to amortize a less efficient fault handling code path over. 
As I've said many times, we're enabling memory and it must run at
"memory speeds". Gregory's code includes a BPF invocation to resolve 
each vma fault, but does avoid the BPF hashmap lookup that would be 
required with a generalized implementation of Joanne's ideas.

The first question (very much unanswered) is whether a BPF fault handler 
can resolve vma faults with performance equivalent to hugetlbfs or 
anonymous mmap performance. If not, the famfs community will assert that 
BPF would defeat or degrade the purpose of famfs. Added 
overhead/latency/cache misses in a fault handler will serialize into the 
stall time that software sees for a virtual address to be resolved - 
it really is performance critical. If BPF is slower, we'll be able to 
measure it, but one benchmark or test case does not fit all, so this
won't be a one-and-done test...

I'll share performance measurements as soon as I can build Gregory's code, 
test, get time on a proper big-memory cluster, and measure something that 
makes sense. This will take some days, but I'm working it.

Hopefully Monday I plan to try to do a substantial on-list reply that
attempts to summarize the various objections to my current famfs fuse 
implementation as well as the open questions and my specific performance
and complexity concerns.

Thanks,
John


Reply via email to