> -----Original Message-----
> From: Linux on 390 Port [mailto:[email protected]] On Behalf Of
> Rob van der Heij
> As Jan points out the FST is fragmented.
Agreed. However, each piece contains pointers to the next piece you need, and
you need that information anyway, so following the breadcrumbs is not an
operational loss as it happens in two scenarios: at first access, and after a
update to the FST in R/W mode.
> The purpose of mmap() is that you
> map all blocks in virtual mem
The purpose of mmap() is to map *A* specified object (disk, shared memory
block, etc) of a specified size starting at a specified offset from the start
of the object to *A* memory segment of equal size in the process' address
space. It does not have to map ALL blocks of a disk to access some of the data
on it.
POSIX (IEEE Std 1003.1) definition of mmap():
The mmap() function shall establish a mapping between a process' address space
and a file, shared memory object, or typed memory object. The format of the
call is as follows:
pa=mmap(addr, len, prot, flags, fildes, off);
The mmap() function shall establish a mapping between the address space of the
process at an address pa for len bytes to the memory object represented by the
file descriptor fildes at offset off for len bytes. The value of pa is an
implementation-defined function of the parameter addr and the values of flags,
further described below. A successful mmap() call shall return pa as its
result. The address range starting at pa and continuing for len bytes shall be
legitimate for the possible (not necessarily current) address space of the
process. The range of bytes starting at off and continuing for len bytes shall
be legitimate for the possible (not necessarily current) offsets in the file,
shared memory object, or typed memory object represented by fildes.
> and then simply access the blocks in memory.
> Linux does the I/O under the covers.
I follow the concept, and see the advantages of using mmap to do the I/O under
the covers. At this point, we're optimizing to minimize the amount of data we
need, and thus the impact on other stuff that uses memory in the same virtual
machine (and WSS of same).
> Since your blocks can be anywhere on
> disk, you map the entire thing.
Here's where we diverge.
There are two issues here:
1) accessing the minidisk and representing its contents to Linux at a point in
time
2) accessing the content of the minidisk
Mmap()ing the whole disk is a convenient solution to both problems, HOWEVER:
To access a minidisk and represent it to Linux, you do NOT need every block on
the disk to be represented in a structure, you need the label data and the FST
data (which, btw you need to read first ANYWAY to mmap the whole disk as you
need to know the logical number of blocks to set up the mmap!).
To use the files on the minidisk, you need the blocks CONTAINED in the file,
not the entire disk. You get that from the FST and you mmap() those blocks.
Quote (again from IEEE Std 1003.1): Use of mmap() may reduce the amount of
memory available to other memory allocation functions.
This is what triggered the discussion. In no case do you ever need the entire
set of blocks on the disk at the same time, unless they are contained in a
single large file, which our use case (big disk with lots of small to
medium-size files) makes unlikely, if not explicitly impossible by definition
of the problem.
>To map just record 3-5 is no gain if you need
> to point still at the rest of the blocks.
See above. It is a gain at access time (you don't need ALL the blocks, you need
the ones to identify the volume, create a view of the volume contents, and
where the interesting content is, or at least starts). For R/O you need to
build an in memory copy exactly once (on first access to the minidisk, then you
can use it forever until the next access). For R/W you need a live copy in
memory of the entire FST, which you need to build and maintain, regardless of
activity or access method. The in-memory copy does not have to be discontiguous
-- in fact, you *want* it to be contiguous so you can use simple indexing of a
structure pattern over the FST entries for performance.
You don't need the other data AT ALL until you actually access a file in some
way, and then you need only the blocks that comprise the file you want.
> DIAG250 is a block driver, just like Linux can do. Extra work is to allocate
> memory to hold blocks while you work on them, make sure to flush the
> updated pages, etc.
I suggested looking at DIAG 250 for ideas on how to approach the problem. I
explicitly said that I do not want a duplicate of DIAG 250.
Yes, it's going to be a little more work on the housekeeping tasks if you want
R/W access, but you'd have to do substantially the same housekeeping with the
full-disk map approach. It's mostly buffer management issues, though; the
actual update to the data on disk can still be done with memcpy and you still
get the benefit of mmap goodness; you just have to think about it a bit more.
> The approach to try map and take the long route as alternative is nice best of
> both, but double code.
Rough consensus and running code. That's probably the compromise setting -- I
could live with it. I could live with messing with the ulimit if I was handing
it one ginormous file -- that's a real outlier. It shouldn't be the default
case.
Bottom line: I think we are in violent agreement that we now know what we
don't want. 8-) Otherwise, we're just doing a bit of whiteboard discussion on
other ways to do it.
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/