[Jchat] Mapped File Mysteries

Don Guinn Sat, 20 Feb 2010 10:11:35 -0800

Many comments for multi-core support recommended using mapped files as an
efficient tool to share data between J sessions. I must admit that I have
never used mapped files because so far I haven't had a need for them. And I
have been wary of them as mapped names do strange things. So I went and ran
the labs on mapped files for the umpthteen time. Not in the sense of how I
could use them, but how they work and how efficient they might be.


Accessing existing files on disk as nouns is a great benefit. But how does
it differ from using fread/fwrite? Once mapped, reading and writing are not
necessary to access or update the file on disk. I suspect that the file
simply becomes a temporary extension to the system swap file. As a
consequence the entire file is not read, but only pages are read as the
application accesses the pages. Only then is real memory assigned. A great
savings for real memory if only small parts of the file are accessed or
updated. If you use fread the entire file is read into the virtual address
space (vas) resulting in real memory required for the entire file. If the
application accesses the entire file then fread should out-perform mapped
files as the entire file is read sequentially instead of randomly as page
faults occur. On the other hand, the application does not have to wait for
an entire mapped file to be read before starting processing. But page faults
on the first access to a page would make it pause as the disk file has to be
read to resolve the page fault.

Vas requirements are identical between mapped files and names initialized
with fread. Mapped files go into the same vas as the application. Map a 100M
file takes 100M of vas. Same as reading the file into a name.

Real storage - for mapped files, probably nothing until the application
accesses a page of the file. Real storage for fread, the size of the file.

So I feel that applications accessing large files, take a long time to
process the file and run on machines with small amounts of real memory could
benefit from mapped files. But if the entire file is to be processed with
little computation involved, and you have lots of real memory available,
mapped files would not out-perform fread. fread might be faster as the
operating system should optimize reading a file sequentially.

There are obvious performance advantages of mapped files when shared among
applications. Changes made in one vas are instantly seen by applications
running in other vas's. Only one copy of the file is needed saving real
memory.

Comments were made that for multi-core processing that mapped files would
out-perform sockets. Let's look at how that might work.

Given an application that wants to pass data to another core. It creates a
mapped file containing the data to pass. A mapped file is in memory, but it
also resides on disk. Therefore, the file must be allocated on disk and the
data at least scheduled to be written to the disk. The allocation requires
disk operations, which the system may have directories etc. cached (not Lx
cache) in its vas so no actual disk reads are required, but we're talking a
significant amount of processing. And the allocation process should not have
to wait for the writes to disk creating the file to complete
(hopefully). Next, after the other instance of J is notified somehow that
the mapped file is there and to read the file, it must first allocate the
file as mapped. Again, disk operations, hopefully, cached. I would suspect
that this approach would be pretty slow.

Seems like mapped files in multi-core environments would be best where the
files are large and small pieces modified a lot. All J instances should be
started and predefined applications including connecting the mapped files at
the beginning to minimize startup time required to process a transaction.
Ideal type of application might be data base query/update.

Applications processing small amounts of data and taking lots of time to
process would probably work as well or better transferring data via sockets.

If files to share are read-only then don't bother to use mapped files to
share data between the parent and child J instances. Simply pass the file
name and let the child read it or map it.

Mapped files make me nervous because mapped names are passed by name, not
value. And it's hard to know that a name is mapped. So, in addition to not
having a need for mapped files yet, I really worry about passing a
potentially mapped name to any function I did not write myself which I would
know I wrote to handle mapped names properly. Basically, primitives unmap
names if the primitive may modify the value. But that is not completely
true. Today:

]       returns the name still mapped
0&}.  Unmaps
#{.]   Unmaps
1&#  Unmaps
>      Meaningless as a mapped argument cannot be boxed
;       Meaningless, same as for >
+      Maybe
<.     Maybe
>.     Maybe

Interesting situation for the maybe's. For + if the argument is type complex
it makes a copy. If it is real or integer it leaves it mapped. Similarly for
<. and >. . If the argument is real it makes a copy. If it is integer it
leaves it mapped. This is keyed off the internal data type. Not the values
in the argument. This is the way it works today in J6.02. This could easily
change.


Rant Summary

Mapped files can be very useful, but only for very specialized situations.
They can cause programming errors that can be very difficult to track down.
They do not support boxed, extended integer nor rationals right now.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jchat] Mapped File Mysteries

Reply via email to