http://lwn.net/Articles/2.6-kernel-api/?format=printable
API changes in the 2.6 kernel series
[Posted February 25, 2009 by corbet]
The 2.6 kernel development series differs from
its predecessors in that
much larger and potentially destabilizing changes are being
incorporated
into each release. Among these changes are modifications to the
internal
programming interfaces for the kernel, with the result that kernel
developers must work harder to stay on top of a continually-shifting
API.
There has never been a guarantee of internal API stability within the
kernel - even in a stable development series - but the rate of change
is
higher now.
This article will be updated to keep track of the internal
changes for each
2.6 kernel release. Its permanent location is:
http://lwn.net/Articles/2.6-kernel-api/
If you are looking for changes prior to 2.6.26, you'll find them on the older version of this page.
Last update: March 24, 2009.
2.6.29 (March 23, 2009)
- The massive task
credentials patch set has been merged. This code reorganizes the
handling of process credentials (user ID, capabilities, etc.). One of
the immediate implications of this change is direct references to
credential-oriented fields in the task structure need to be changed;
for example, current->user->uid becomes current_uid().
See Documentation/credentials.txt
for a description of the new API.
- The ftrace code has seen a lot of internal changes. The
function tracing feature has seen a number of improvements, and the
developers have added mechanisms to profile the behavior of if
statements, provide function call graphs, obtain user-space stack
traces, and follow CPU power-state transitions.
- Most of the callback functions/methods associated with the net_device
structure have been moved out of that structure and into the new struct net_device_ops.
In-tree drivers have been converted to the new API.
- The priv field has been removed from struct
net_device; drivers should use netdev_priv() instead.
- The generic PHY layer now has power management support. To
that end, two new methods - suspend() and resume()
- have been added to struct phy_driver.
- The networking layer now supports large receive offload (or
"generic receive offload") operation.
- The NAPI API has been cleaned up somewhat; in particular,
functions like netif_rx_schedule(), netif_rx_schedule_prep(),
and netif_rx_complete() have lost the unneeded struct
net_device parameter.
- The poll() file operation is now allowed to
sleep; see this article
for more information on this change.
- The CPU mask mechanism, used to represent sets of
processors in the system, is in the middle of being massively reworked.
The problem is that CPU masks were often put on the stack, but, as the
number of processors grows, the stack lacks room for the mask. The new
API is designed to get these masks off the stack, and to guard against
anybody ever trying to put one back. See this
posting by Rusty Russell for details on this work.
- An infrastructure
for asynchronous function calls has been merged. This code is still
a work in progress, though, and, for 2.6.29, it will not be activated
in the absence of the fastboot command-line parameter.
- The exclusive
I/O memory allocation functions have been merged.
- There is a new synchronous hash interface called "shash."
It simplifies the use of synchronous hash operations while allowing the
same tfm to be used simultaneously in different threads. All in-tree
users have been switched to the new API.
- The hrtimer code has been simplified with the removal of
variable modes for callback functions. All processing is now done in
hardirq context.
- A new set of LSM hooks has been added; these support
pathname-based security operations. With the merging of these hooks,
one major obstacle to the inclusion of security modules like AppArmor
and TOMOYO has been removed.
- The kernel will now refuse to build with GCC 4.1.0 or
4.1.1; those versions have unfortunate bugs which prevent the building
of a working kernel. Versions 3.0 and 3.1 have also been deemed to be
too old and will not be supported in 2.6.29.
- Video4Linux drivers now use a separate v4l2_file_operations
structure to hold their VFS-like callbacks. The prototypes of a number
of these functions have been changed to remove the inode
argument.
- Video4Linux2 has also acquired a new "subdevice" concept,
meant to reflect the fact that video "devices" tend to be, in reality,
a set of cooperating devices. See the new document for a
description of how this mechanism works.
- Two new functions - stop_machine_create() and stop_machine_destroy()
- allow the independent creation of the threads used by stop_machine().
That, in turn, lets those threads be created before trying to actually
stop the machine, making that operation more resistant to failure.
- The exports for a number of SUNRPC functions have been
changed to GPL-only.
- The internal MTD (memory technology device) API has seen
significant changes aimed at supporting larger devices (those requiring
64-bit sizes).
2.6.28 (December 24, 2008)
- Discard request and request timeout handling have been
added to the block layer; a number of other internal API changes have
been made as well. See this
article for details.
- Video4Linux2 drivers no longer have their open()
function called with the big kernel lock held. The lock_kernel()
calls have been pushed down into individual drivers within the mainline
tree; external drivers will need to be fixed.
- A number of tracing-related patches have been merged. These
include the tracepoints
mechanism, some instrumentation in the core scheduler code,
improvements to the ftrace function tracing feature, a new ftrace-based
stack tracer, a new ftrace-based boot (initcall) tracer, and the low-level trace buffer
code.
- The sysctl strategy() function prototype has
changed: the unused name and nlen parameters have
been removed.
- Asynchronous I/O support can now be configured out of the
kernel, saving about 7KB of space on systems where AIO is not needed.
- As planned, device_create_drvdata() has been
renamed to device_create(), with the same parameters.
- There is now a mechanism to enable and disable output from pr_debug()
and dev_dbg() calls on a per-module basis. Control is through
a virtual file in debugfs. There is no documentation file associated
with this change; instructions on how to use this feature can be found
in the
patch changelog.
- The new dev_WARN() function:
dev_WARN(struct device *dev, char *format, ...);
will output the formatted warning, along with a full stack
trace. This will allow the warnings to be collected at kerneloops.org and incorporated into
the reports there.
- The new %pR formatting directive allows printk()
and friends to output the contents of resource structures.
- There is a new function intended to make life easier for
PCI driver writers:
static inline void *pci_ioremap_bar(struct pci_dev *pdev, int bar);
This function will remap the entire PCI I/O memory region,
as selected by the bar argument.
- There is a new core_param() macro:
core_param(name, var, type, perm);
Its purpose is to define "core" parameters and let them be
represented in /sys/module/kernel/parameters.
- It is now possible to create a workqueue running at
realtime priority with:
struct workqueue_struct *create_rt_workqueue(const char *name);
- The block driver API has changed considerably, with the inode
and file parameters being removed from most block device
operations. The new API looks like this:
struct block_device_operations {
int (*open) (struct block_device *bdev, fmode_t mode);
int (*release) (struct gendisk *gd, fmode_t mode);
int (*locked_ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*compat_ioctl) (struct block_device *bdev, fmode_t mode,
unsigned cmd, unsigned long arg);
int (*direct_access) (struct block_device *bdev, sector_t sector,
void **kaddr, unsigned long *pfn);
int (*media_changed) (struct gendisk *gd);
int (*revalidate_disk) (struct gendisk *gd);
int (*getgeo)(struct block_device *bdev, struct hd_geometry *geo);
struct module *owner;
};
The new prototypes do away with the file and inode
structure pointers which were passed in previous kernels. Note that the
ioctl() method is now called without the big kernel lock;
code needing BKL protection must explicitly define a locked_ioctl()
function instead.
- The range timer
API has been merged; callers can now specify a time period in which
they would like the timeout to be delivered. The kernel can then take
advantage of the range to coalesce wakeups and keep the processor idle
for longer periods.
2.6.27 (October 9, 2008)
- The register_security() function has been
removed. Security modules which wish to implement stacking must now do
so explicitly.
- The request_queue_t type is gone at last; block
drivers should use struct request_queue instead.
- Quite a bit of big
kernel lock removal work has been merged. For char devices, the open()
method from struct file_operations is no longer protected by
the BKL. Calls to fasync() have also lost BKL protection.
- Many drivers have been converted to use the firmware
loader, making it possible to strip the firmware from the kernel for
those who are inclined to do so. See this article for more
information on the firmware work.
- The API work in the i2c layer continues; there is now an
autodetection capability which allows new-style drivers to detect
devices on their buses automatically.
- The SCSI layer has gained new support for "device
handlers," which are mostly concerned with multipath management. Some
of this code has been moved over from the device mapper.
- The new suspend
and hibernate infrastructure has been merged, providing a wider set
of callbacks for power management events. The PCI and platform bus
interfaces have been enhanced with support for this new infrastructure.
- The TTY layer continues to evolve; significant changes
include the introduction of a new tty_port structure meant to
hold information common to all TTY ports and a rework of the line
discipline code.
- The mac80211 code has a new module which can simulate any
number of IEEE 802.11 radios; it is suitable for testing mac80211
functionality and associated user-space tools.
- There is a new "rfkill" mechanism for unified handling of
"radio off" switches on wireless devices.
- A number of Video4Linux2 format-related callbacks have been
renamed to make them match the names used with the associated buffer
types. In addition, the vidioc_enum_fmt_vbi_cap() callback
has been deprecated and marked for removal in 2.6.28.
- The videobuf layer now has support for controllers which
cannot do scatter/gather I/O.
- The USB "gadget" framework has been massively reworked to
provide better support for composite devices.
- The prototype for device_create() has changed:
struct device *device_create(struct class *class,
struct device *parent,
dev_t devt,
void *drvdata,
const char *fmt, ...);
Those who see a resemblance to device_create_drvdata()
are right; all in-tree users were converted over to that interface, the
old device_create() was removed, and device_create_drvdata()
was renamed. For now, a macro makes calls to device_create_drvdata()
do the right thing, but that macro will probably go away before the
2.6.27 final release.
- User-space UIO drivers can now write a signed value to the /dev/uioX
device to enable and disable interrupts.
- Debugfs (finally) has a function for removing an entire
directory tree:
void debugfs_remove_recursive(struct dentry *dentry);
As a result, code creating hierarchies in debugfs no
longer need remember the dentry of every file they create.
- The tracehook mechanism for defining static trace points
(described in this article)
has been merged, along with a number of trace points in the core
kernel.
- A new, lockless form of get_user_pages() has been
added:
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
Details of this interface can be found in this article, with the one
note that early versions were called fast_gup() instead. (See
also the related lockless
page cache work, which was also merged).
- The long-debated mmu-notifiers
patch has been merged. The notifiers allow external memory
management units (as may be seen in some graphics cards or in
virtualized guests) to be told about decisions made by the core memory
management code.
- There is a new framework for debugging boot-time memory
initialization; there's also "a few basic defensive measures" intended
to prevent difficult-to-debug boot problems.
- The new function:
int object_is_on_stack(void *obj);
returns a true value if the pointed-to object is on the
current kernel stack.
- There is a new macro for issuing warnings:
WARN(condition, format, ...);
It's much like WARN_ON() in that it will produce
a full oops listing; the difference is the added printk()-style
format string and arguments.
- A new helper function:
int flush_work(struct work_struct *work);
waits for the specific workqueue job work to
finish executing.
- dma_mapping_error() and pci_dma_mapping_error()
have new prototypes:
int dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
int pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr);
In each case, they have gained a new argument specifying
which device the mapping is being done for.
- There are a couple of new radix tree functions:
unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root,
void ***results,
unsigned long first_index,
unsigned int max_items);
unsigned int radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root,
void ***results,
unsigned long first_index,
unsigned int max_items,
unsigned int tag);
They are useful for looking up multiple items in a single
call.
- Slab cache constructors no longer have a pointer to the
cache itself as an argument; they now take a single void *
pointer to the object itself.
- The long list of Video4Linux2 ioctl() callbacks
has been moved into its own structure (struct v4l2_ioctl_ops)
which is pointed to by the ioctl_ops member of struct
video_device.
2.6.26 (July 13, 2008)
- At long last, support for the KGDB interactive debugger has
been added to the x86 architecture. There is a DocBook document in the
Documentation directory which provides an overview on how to use this
new facility.
- Page attribute table (PAT) support is also (again, at long
last) available for the x86 architecture. PATs allow for fine-grained
control of memory caching behavior with more flexibility than the older
MTRR feature. See Documentation/x86/pat.txt
for more information.
- ioremap() on the x86 architecture will now always
return an uncached mapping. Previously, it had taken a more relaxed
approach, leaving the caching as the BIOS had set it up. The practical
result was to almost always create uncached mappings, but with
occasional exceptions. Drivers which depend on a cached mapping will
now break; they will need to use ioremap_cache() instead.
- The nopage() virtual memory area operation has
been removed; all in-tree code is now using fault() instead.
- Two new functions (inode_getsecid() and ipc_getsecid()),
added to support security modules and the audit code, provide general
access to security IDs associated with inodes and IPC objects. A number
of superblock-related LSM callbacks now take a struct path
pointer instead of struct nameidata. There is also a new set
of hooks providing generic audit support in the security module
framework.
- The now-unused ieee80211 software MAC layer has been
removed; all of the drivers which needed it have been converted to
mac80211. Also removed are the sk98lin network driver (in favor of
skge) and bcm43xx (replaced by b43 and b43legacy).
- The generic
semaphores patch has been merged. The semaphore code also has new down_killable()
and down_timeout() functions.
- The ata_port_operations structure used by libata
drivers now supports a simple sort of operation inheritance, making it
easier to write drivers which are "almost like" existing code, but with
small differences.
- A new function (ns_to_ktime()) converts a time
value in nanoseconds to ktime_t.
- The final users of struct class_device have been
converted to use struct device instead. The class_device
type has been removed.
- The seq_file code now accepts a return value of SEQ_SKIP
from the show() callback; that value causes any accumulated
output from that call to be discarded.
- The Video4Linux2 API now defines a set of controls for
camera devices; they allow user space to work with parameters like
exposure type, tilt and pan, focus, and more.
- On the x86 architecture, there is a new configuration
parameter which allows gcc to make its own decisions about the inlining
of functions, even when functions are declared inline. In
some cases, this option can reduce the size of the kernel's text
segment by over 2%.
- The legacy IDE layer has gone through a lot of internal
changes which will break any remaining IDE drivers.
- The SLUB allocator supports a new sysfs file (/sys/kernel/slab/name/order)
which allows system administrators to change the size of page
allocations used by the named slab.
- A condition which triggers a warning from WARN_ON
will now also taint the kernel.
- The get_info() interface for /proc
files has been removed. There is also a new function for creating /proc
files:
struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
struct proc_dir_entry *parent,
const struct file_operations *proc_fops,
void *data);
This version adds the data pointer, ensuring
that it will be set in the resulting proc_dir_entry structure
before user space can try to access it.
- The object
debugging infrastructure has been merged.
- The klist type now has the usual-form macros for
declaration and initialization: DEFINE_KLIST() and KLIST_INIT().
Two new functions (klist_add_after() and klist_add_before())
can be used to add entries to a klist in a specific position.
- kmap_atomic_to_page() is no longer exported to
modules.
- There are some new generic functions for performing 64-bit
integer division in the kernel:
u64 div_u64(u64 dividend, u32 divisor);
u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder);
s64 div_s64(s64 dividend, s32 divisor)
s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
Unlike do_div(), these functions are explicit about
whether signed or unsigned math is being done. The x86-specific div_long_long_rem()
has been removed in favor of these new functions.
- There is a new string function:
bool sysfs_streq(const char *s1, const char *s2);
It compares the two strings while ignoring an optional
trailing newline.
- The prototype for i2c probe() methods has
changed:
int (*probe)(struct i2c_client *client,
const struct i2c_device_id *id);
The new id argument supports i2c device name
aliasing.
- There is a new configuration (MODULE_FORCE_LOAD)
which controls whether the loading of modules can be forced if the
kernel thinks something is not right; it defaults to "no."
|