http://people.ee.ethz.ch/~arkeller/linux/multi/kernel_user_space_howto.html#toc2
These file systems are optional to the Linux kernel, and may
not be
enabled on your system. The file "/lib/modules/`uname
-r`/build/.config" will tell you how your kernel is configured.
In order to exchange data between user space and kernel space the
Linux kernel provides a couple of RAM based file systems.
These interfaces are, themselves, based on files.
Usually a file represents a single value, but it may also represent a
set of values. The user space can access these values by means of the
standard read(2) and write(2)
functions.
For most file systems the read and write function results in a callback
function in the Linux kernel which has access to the corresponding
value.
Despite offering similar functionality, the different RAM based file
systems are all designed for separate purposes.
However it is easy to use these file systems for other purposes as
well. Questions such as "Which file system should be used?" or "Why
is there a need for the different file systems?" often arise on the
Linux kernel mailing list.
The arguments are controversial and each developer seems to have a
unique view.
The benefit of using the read and write function in comparison to,
for example, socket based approaches, is that the user space has a lot
of tools available to
send data to the kernel space (e.g. cat(1), echo (1)). These
programs are well known to users and they can be used in scripts.
Description
The procfs, located in /proc, is the best known
interface of this class.
It was originally designed to export all kind of process information
such as the current status of the process, or all open file descriptors
to the user space. Despite its initial purposes, the procfs has been
used for a lot of other purposes:
- provide information about the running system such as cpu
information, information about interrupts, about the available memory
or the version of the kernel.
- information about "ide devices", "scsi devices" and "tty's".
- networking information such as the arp table, network statistics
or lists of used sockets
There is a special subdirectory: /proc/sys.
It allows to configure a lot of parameters of the running system.
Usually each file consists of a single value.
This value may correspond to:
- a limit (e.g. maximum buffer size)
- turn on or off a given functionality (for example routing)
- or represent some other kernel variable
All directories and files below /proc/sys/ are not
implemented with the procfs interface.
Instead they use a mechanism called sysctl. See section
sysctl for further details about sysctl.
Note, despite the wide use of the procfs, it is deprecated and
should only be used to export information related to a process itself.
Implementation
In order to use the procfs it needs to be compiled with the Linux
kernel source code.
This is done by setting the parameter CONFIG_PROC_FS=y. In most
standard configurations this is enabled by default
Procfs supports two different APIs for kernel modules:
The legacy procfs API: It is easy to use as long as the amount
of data to be handled is small. In this context small means smaller
than one page size (PAGE_SIZE), which is in i386 systems
4096 bytes.
The seq_file API: Seq_file was designed to facilitate the
handling of read requests. It supports read requests for more than PAGE_SIZE
bytes and it provides mechanism to traverse a list, collect the
elements of the list, and send all elements to user space.
1. Legacy procfs API
The legacy procfs API allows for the creation of files and
directories.
For each file you have to specify two callback functions:
One which is executed when a user reads the file and the other when a
user writes to the file.
The use of this API is well described in the "Linux Kernel Procfs
Guide" distributed with the Linux kernel source code.
Therefore we give here only a very basic example: A module which
creates a directory as well as a file.
If your file provides more than PAGE_SIZE bytes of data
it is easy to get things wrong.
This is due to the API of the read function:
read(char *page, char **start, off_t off, int count, int *eof,
void *data)
The first parameter of this function is a buffer with the size
corresponding to one page.
Hence, if there is more data, the read has to be split in multiple
pieces.
2. Seq_file API
The seq_file API is concerned with read requests solely - no writes.
It hides the PAGE_SIZE
boundary from the developer and it provides an API to step through a
series of objects, collect the data from each of them and put all those
data in the file. An example module can be found at http://lwn.net/Articles/22359/.
Further Reading and Resources
Description
Sysfs was designed to represent the whole device model as seen from
the Linux kernel.
It contains information about devices, drivers and buses and their
interconnections.
In order to represent the hierarchy and the interconnections sysfs is
heavily structured and contains a lot of links between the individual
directories.
As for kernel 2.6.23 it contains the following 9 top-level directories:
sys/block/ all known block devices such as hda/
ram/ sda/
sys/bus/ all registered buses. Each directory below bus/
holds by default two subdirectories:
device/ for all devices attached to that bus
driver/ for all drivers assigned with that bus.
sys/class/ for each device type there is a
subdirectory: for example /printer or /sound
sys/device/ all devices known by the kernel,
organised by the bus they are connected to
sys/firmware/ files in this directory handle the
firmware of some hardware devices
sys/fs/ files to control a file system, currently
used by FUSE, a user space file system implementation
sys/kernel/ holds directories (mount points) for
other filesystems such as debugfs, securityfs.
sys/module/ each kernel module loaded is represented
with a directory.
sys/power/ files to handle the power state of some
hardware
Implementation
In order to use sysfs it needs to be compiled with the Linux kernel
source code.
This is done by setting the parameter CONFIG_SYSFS=y.
The philosophy behind sysfs is to represent each value with a
dedicated file.
In addition each file has a maximum size of PAGE_SIZE
bytes.
For a kernel module there are three possibilities to use a file
below /sys:
- module parameter
- register new subsystem
- debugfs: debugfs,
mounted in
/sys/kernel/debug. More information about debugfs.
Module Parameter API
Similar to command line arguments for applications, Linux kernel
modules may allow a set of parameters.
These parameters can not only be specified upon module insertion but
also during module run time.
A module parameter can be defined with the following macro:
module_param_named(name, value, type, perm)
This macro creates a parameter called "name" which
corresponds to the variable with name "value" of type "type".
There are many predefined types such as byte (for a
single character), int (for an integer) or charp
(for a string). It is also possible to add new types.
The file include/linux/stat.h provides all predefined
types as well as an introduction how to define new types.
The module_param macro creates a file called /sys/modules/module_name/name
with the access rights specified by perm.
Depending on the specified access rights, the file - and thereby the
parameter value - can be read or written.
If perm is set to 0 the file is not created, and
therefore the parameter cannot be accessed during run time.
The module does not receive a notification when a user reads or
writes a given parameter, but the value is silently changed.
Therefore it is not possible to do some additional stuff when a
parameter changes its value.
This may be acceptable in some circumstances as for changing a debug
level, but in most circumstances the module wants to do some additional
stuff such as sanity checks or manipulating a data structure.
Standard Sysfs API
The standard sysfs API uses a dedicated terminology: A file is
called an attribute, the function executed upon reading
an attribute is called show and the one for writing an
attribute store.
Before starting with the implementation of a module which uses sysfs
you have to figure out which subdirectory it belongs to.
If you deal with a bus, it belongs to bus/, with a file
system it belongs to fs/ or with a block device it
belongs to block/.
The API to use depends on the given subdirectory.
We first show an example which uses the low level sysfs functions to
add a new directory to fs/, and in a second example we
show how to add a new
entry to the bus/ directory.
1. new fs/ entry
| sysfs_ex.c
|
creates the directory /sys/fs/myfs/ along with
two files first and second. Both containing
one single integer value. |
|
The first step is to declare our subsystem. This can be done with
the use of the decl_subsys macro (on top of the file).
This macro creates a struct kset with the name myfs_subsys.
The module_init() function performs the proper
registration of our subsystem:
The macro kobj_set_kset_s initializes myfs_subsys
so that it will be part of the fs_subsys. The field myfs_subsys.kobj.ktype
points to a structure which holds all the attributes as well as the
functions to read and write the attributes.
And finally a call to register_subsystem() registers our
subsystem.
Files are generally represented by a struct attribute.
This struct holds the name as well as the access permission for the
corresponding file, but no data.
Therefore you have to create your own attribute type which consists of
at least the struct attribute and the value corresponding
to that file.
By design all attributes share the same show and store functions.
Each time one of these two functions is invoked it gets the
corresponding struct attribute
as an argument.
Therefore in the show and store functions you can obtain the value
corresponding to the file being read/written and you can manipulate it
accordingly.
For this purpose you need the macro container_of(ptr, type,
member). ptr is a pointer to the member of the
struct. type is the type of the struct this member is
emeded in and member is the name of the member within the
struct.
2. new bus/ entry
| sysfs_ex2.c
|
use of sysfs in combination with a bus. It provides the
possibility to
read and write one value with the help of "my_pseude_bus". |
|
First of all we define our bus my_pseudo_bus. Then we
create our attribute with the help of the BUS_ATTR macro.
In the init function we register our pseudo bus and we create a file
(attribute).
If we would like more than one attribute we would have to use BUS_ATTR
several times and provide for each attribute its own store and show
function.
This example is similar to the debugging facility of the scsi bus,
which is implemented in drivers/scsi/scsi_debug.c
Resources and Further Reading
Description
The configfs is somewhat the counterpart of the sysfs.
It can be seen as a filesystem based manager of kernel objects.
An important difference between configfs and sysfs is that in configfs
all objects are created from user space with a call to mkdir(2).
The kernel responds with creating the attributes (files) and then they
can be read and written by the user.
If the user no longer needs the files, he calls rmdir(2)
and everything gets deleted.
Therefore the life cycle of a configfs object is fully controlled by
user space.
Each time mkdir is invoked a new "config_item" is
created by the kernel implementation.
This config_item represents the files (attributes), the show and store
callback functions as well as the associated value.
Therefore each mkdir creates a new directory along with new files which
represent new values.
Configfs has the same limitations than sysfs: each file should
represent only one value and it should be smaller than PAGE_SIZE
bytes.
Implementation
In order to use configfs it needs to be compiled with the Linux
kernel source code.
This is done by setting the parameter CONFIG_CONFIGFS_FS=y.
In order to access configfs it has to be mounted with the following
command:
mount -t configfs none /config
The Linux kernel documentation provides a good manual for configfs
along with an example module.
Therefore we do not describe the configfs implementation aspects.
Resources and Further Reading
- Linux kernel source code: Documentation/filesystems/configfs
Description
Debugfs is a simple to use RAM based file system especially designed
for debugging purposes.
Developers are encouraged to use debugfs instead of procfs in order to
obtain some debugging information from their kernel code.
Debugfs is quite flexible: it provides the possibility to set or get a
single value with the help of just one line of code but the developer
is also allowed to write its own read/write functions, and he can use
the seq_file interface described in the procfs section.
Implementation
In order to use debugfs it needs to be compiled with the Linux
kernel source code.
This is done by setting the parameter CONFIG_DEBUG_FS=y.
Before having access to the debugfs it has to be mounted with the
following command.
mount -t debugfs none /sys/kernel/debug
| debugfs.c
|
kernel module that implements the "one line" API for a
variable of type u8 as well as the API with which you can
specify your own read and write functions. |
|
All the "one line" APIs start with debugfs_create_ and
are listed in include/linux/debugfs.h
The API with which you can provide your own read and write functions
is similar to the one of procfs.
In contrast to sysfs, you may create directories and files without
having to care about a given hierarchy.
Resources and Further Reading
Description
The sysctl infrastructure is designed to configure kernel parameters
at run time.
The sysctl interface is heavily used by the Linux networking subsystem.
It can be used to configure some core kernel parameters; represented as
files in /proc/sys/*.
The values can be accessed by using cat(1), echo(1)
or the sysctl(8) commands.
If a value is set by the echo command it only persists as
long as the kernel is running,
but gets lost as soon as the machine is rebooted. In order to change
the values permanently they have to be written to the file /etc/sysctl.conf.
Upon restarting the machine all values specified in this file are
written to the corresponding files in /proc/sys/.
Implementation
| sysctl.c
|
sysctl example module: write an integer to
/proc/sys/net/test/value1 and value2 respectively |
|
Each entry in the /proc/sys directory is represented
by an entry in a table maintained by the Linux kernel, arranged in a
hierarchy.
A directory is represented by an entry pointing to a subtable.
A file is represented by an entry of type struct ctl_table.
This entry consists of the data represented by this file along with
some access rules.
New files and directories can be added by expanding one of the
subtables.
In this example we add a new directory called test below
the /proc/sys/net/ directory.
Our directory has got two files: value1 and value2.
Each of these files hold an integer variable which can have a value
between 10 and 20. The user root is allowed to change the entries
whereas normal user are allowed to read the entries.
Each file is represented with an entry in the test_table[]
array:
static ctl_table test_table[] = {
{
.ctl_name = CTL_UNNUMBERED,
.procname = "value1",
.data = ""
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dointvec_minmax,
.strategy = &sysctl_intvec,
.extra1 = &min,
.extra2 = &max
},
...
}
The struct ctl_table entries are:
.ctl_name: For new entries this has to be
CTL_UNNUMBERED (according to Documentation/sysctl/ctl_unnumbered.txt).
.procname: The name of the file.
.data: A reference to the data we want to be shown
in the file.
maxlen: The size of the data.
mode: Access permissions (read, write, execute for
user, group, others)
proc_handler: The routine which handles read and
write requests. There is a set of default routines declared near the
end of include/linux/sysctl.h
strategy: Some routine that enforces additional
access control. In this example it checks that the value to be written
is between min and max.
static ctl_table test_net_table[] = {
{
.ctl_name = CTL_UNNUMBERED,
.procname = "test",
.mode = 0555,
.child = test_table
},
{ .ctl_name = 0 }
};
This table represents our test directory. The entry .child
says that the elements below this directory are represented by the
table test_table, discussed above.
static ctl_table test_root_table[] = {
{
.ctl_name = CTL_UNNUMBERED,
.procname = "net",
.mode = 0555,
.child = test_net_table
},
{ .ctl_name = 0 }
};
This table represents the directory to which we want to attach our new
directory. In this example, this is the net directory.
In the module_init() function we have to register this root table with
a call to
register_sysctl_table(test_root_table);
Resources and Further Reading
- Linux kernel source code:
net/core/sysctl_net_core.c
- Linux kernel Documentation:
Documentation/sysctl/ctl_unnumbered.txt
Description
As the name suggests, this interface was designed for character
device drivers,
and is commonly used for communication between uer and kernel space.
(For example, users with sufficient privileges my write directly to the
virtual terminal 1 with echo "hi there" > /dev/tty1).
Each module can register itself as a character device and provide
some read and write functions which handle the data.
Files representing character devices are located within the /dev
directory (where you will also find block devices, but we will not be
describing them further). Usually these files correspond to a hardware
device.
Implementation
| cdev.c
|
kernel module that prints its majorNumber to the system log.
The minorNumber can be chosen to be 0. |
|
As with all file system based approaches the module has to specify a
read and a write callback function.
Therefore, we have to register ourself with the function register_chrdev(unsigned
int major, const char *name,
struct file_operations *ops);. major is the major
number of this device. We can set it to 0 to let the kernel choose an
appropriate number.
name is the name of this character device, as it will be
shown below the /dev directory. ops is a
pointer to the read and write functions.
In contrast to most file system based approaches seen so far, the
user has to create the device file explicitly with a call to:
mknod /dev/arbitrary_name c majorNumber minorNumber
Resources and Further Reading