http://lwn.net/Articles/2.6-kernel-api/?format=printable

API changes in the 2.6 kernel series

The 2.6 kernel development series differs from its predecessors in that much larger and potentially destabilizing changes are being incorporated into each release. Among these changes are modifications to the internal programming interfaces for the kernel, with the result that kernel developers must work harder to stay on top of a continually-shifting API. There has never been a guarantee of internal API stability within the kernel - even in a stable development series - but the rate of change is higher now.

This article will be updated to keep track of the internal changes for each 2.6 kernel release. Its permanent location is:

http://lwn.net/Articles/2.6-kernel-api/

If you are looking for changes prior to 2.6.26, you'll find them on the older version of this page.

Last update: March 24, 2009.

2.6.29 (March 23, 2009)

  • The massive task credentials patch set has been merged. This code reorganizes the handling of process credentials (user ID, capabilities, etc.). One of the immediate implications of this change is direct references to credential-oriented fields in the task structure need to be changed; for example, current->user->uid becomes current_uid(). See Documentation/credentials.txt for a description of the new API.

  • The ftrace code has seen a lot of internal changes. The function tracing feature has seen a number of improvements, and the developers have added mechanisms to profile the behavior of if statements, provide function call graphs, obtain user-space stack traces, and follow CPU power-state transitions.

  • Most of the callback functions/methods associated with the net_device structure have been moved out of that structure and into the new struct net_device_ops. In-tree drivers have been converted to the new API.

  • The priv field has been removed from struct net_device; drivers should use netdev_priv() instead.

  • The generic PHY layer now has power management support. To that end, two new methods - suspend() and resume() - have been added to struct phy_driver.

  • The networking layer now supports large receive offload (or "generic receive offload") operation.

  • The NAPI API has been cleaned up somewhat; in particular, functions like netif_rx_schedule(), netif_rx_schedule_prep(), and netif_rx_complete() have lost the unneeded struct net_device parameter.

  • The poll() file operation is now allowed to sleep; see this article for more information on this change.

  • The CPU mask mechanism, used to represent sets of processors in the system, is in the middle of being massively reworked. The problem is that CPU masks were often put on the stack, but, as the number of processors grows, the stack lacks room for the mask. The new API is designed to get these masks off the stack, and to guard against anybody ever trying to put one back. See this posting by Rusty Russell for details on this work.

  • An infrastructure for asynchronous function calls has been merged. This code is still a work in progress, though, and, for 2.6.29, it will not be activated in the absence of the fastboot command-line parameter.

  • The exclusive I/O memory allocation functions have been merged.

  • There is a new synchronous hash interface called "shash." It simplifies the use of synchronous hash operations while allowing the same tfm to be used simultaneously in different threads. All in-tree users have been switched to the new API.

  • The hrtimer code has been simplified with the removal of variable modes for callback functions. All processing is now done in hardirq context.

  • A new set of LSM hooks has been added; these support pathname-based security operations. With the merging of these hooks, one major obstacle to the inclusion of security modules like AppArmor and TOMOYO has been removed.

  • The kernel will now refuse to build with GCC 4.1.0 or 4.1.1; those versions have unfortunate bugs which prevent the building of a working kernel. Versions 3.0 and 3.1 have also been deemed to be too old and will not be supported in 2.6.29.

  • Video4Linux drivers now use a separate v4l2_file_operations structure to hold their VFS-like callbacks. The prototypes of a number of these functions have been changed to remove the inode argument.

  • Video4Linux2 has also acquired a new "subdevice" concept, meant to reflect the fact that video "devices" tend to be, in reality, a set of cooperating devices. See the new document for a description of how this mechanism works.

  • Two new functions - stop_machine_create() and stop_machine_destroy() - allow the independent creation of the threads used by stop_machine(). That, in turn, lets those threads be created before trying to actually stop the machine, making that operation more resistant to failure.

  • The exports for a number of SUNRPC functions have been changed to GPL-only.

  • The internal MTD (memory technology device) API has seen significant changes aimed at supporting larger devices (those requiring 64-bit sizes).

2.6.28 (December 24, 2008)

  • Discard request and request timeout handling have been added to the block layer; a number of other internal API changes have been made as well. See this article for details.

  • Video4Linux2 drivers no longer have their open() function called with the big kernel lock held. The lock_kernel() calls have been pushed down into individual drivers within the mainline tree; external drivers will need to be fixed.

  • A number of tracing-related patches have been merged. These include the tracepoints mechanism, some instrumentation in the core scheduler code, improvements to the ftrace function tracing feature, a new ftrace-based stack tracer, a new ftrace-based boot (initcall) tracer, and the low-level trace buffer code.

  • The sysctl strategy() function prototype has changed: the unused name and nlen parameters have been removed.

  • Asynchronous I/O support can now be configured out of the kernel, saving about 7KB of space on systems where AIO is not needed.

  • As planned, device_create_drvdata() has been renamed to device_create(), with the same parameters.

  • There is now a mechanism to enable and disable output from pr_debug() and dev_dbg() calls on a per-module basis. Control is through a virtual file in debugfs. There is no documentation file associated with this change; instructions on how to use this feature can be found in the patch changelog.

  • The new dev_WARN() function:

        dev_WARN(struct device *dev, char *format, ...);
    

    will output the formatted warning, along with a full stack trace. This will allow the warnings to be collected at kerneloops.org and incorporated into the reports there.

  • The new %pR formatting directive allows printk() and friends to output the contents of resource structures.

  • There is a new function intended to make life easier for PCI driver writers:

        static inline void *pci_ioremap_bar(struct pci_dev *pdev, int bar);
    

    This function will remap the entire PCI I/O memory region, as selected by the bar argument.

  • There is a new core_param() macro:

        core_param(name, var, type, perm);
    

    Its purpose is to define "core" parameters and let them be represented in /sys/module/kernel/parameters.

  • It is now possible to create a workqueue running at realtime priority with:

        struct workqueue_struct *create_rt_workqueue(const char *name);
    

  • The block driver API has changed considerably, with the inode and file parameters being removed from most block device operations. The new API looks like this:

        struct block_device_operations {
    	int (*open) (struct block_device *bdev, fmode_t mode);
    	int (*release) (struct gendisk *gd, fmode_t mode);
    	int (*locked_ioctl) (struct block_device *bdev, fmode_t mode, 
    	    		     unsigned cmd, unsigned long arg);
    	int (*ioctl) (struct block_device *bdev, fmode_t mode, 
    	    	      unsigned cmd, unsigned long arg);
    	int (*compat_ioctl) (struct block_device *bdev, fmode_t mode, 
    	    		     unsigned cmd, unsigned long arg);
    	int (*direct_access) (struct block_device *bdev, sector_t sector,
    			      void **kaddr, unsigned long *pfn);
    	int (*media_changed) (struct gendisk *gd);
    	int (*revalidate_disk) (struct gendisk *gd);
    	int (*getgeo)(struct block_device *bdev, struct hd_geometry *geo);
    	struct module *owner;
        };
    

    The new prototypes do away with the file and inode structure pointers which were passed in previous kernels. Note that the ioctl() method is now called without the big kernel lock; code needing BKL protection must explicitly define a locked_ioctl() function instead.

  • The range timer API has been merged; callers can now specify a time period in which they would like the timeout to be delivered. The kernel can then take advantage of the range to coalesce wakeups and keep the processor idle for longer periods.

2.6.27 (October 9, 2008)

  • The register_security() function has been removed. Security modules which wish to implement stacking must now do so explicitly.

  • The request_queue_t type is gone at last; block drivers should use struct request_queue instead.

  • Quite a bit of big kernel lock removal work has been merged. For char devices, the open() method from struct file_operations is no longer protected by the BKL. Calls to fasync() have also lost BKL protection.

  • Many drivers have been converted to use the firmware loader, making it possible to strip the firmware from the kernel for those who are inclined to do so. See this article for more information on the firmware work.

  • The API work in the i2c layer continues; there is now an autodetection capability which allows new-style drivers to detect devices on their buses automatically.

  • The SCSI layer has gained new support for "device handlers," which are mostly concerned with multipath management. Some of this code has been moved over from the device mapper.

  • The new suspend and hibernate infrastructure has been merged, providing a wider set of callbacks for power management events. The PCI and platform bus interfaces have been enhanced with support for this new infrastructure.

  • The TTY layer continues to evolve; significant changes include the introduction of a new tty_port structure meant to hold information common to all TTY ports and a rework of the line discipline code.

  • The mac80211 code has a new module which can simulate any number of IEEE 802.11 radios; it is suitable for testing mac80211 functionality and associated user-space tools.

  • There is a new "rfkill" mechanism for unified handling of "radio off" switches on wireless devices.

  • A number of Video4Linux2 format-related callbacks have been renamed to make them match the names used with the associated buffer types. In addition, the vidioc_enum_fmt_vbi_cap() callback has been deprecated and marked for removal in 2.6.28.

  • The videobuf layer now has support for controllers which cannot do scatter/gather I/O.

  • The USB "gadget" framework has been massively reworked to provide better support for composite devices.

  • The prototype for device_create() has changed:

        struct device *device_create(struct class *class, 
                                     struct device *parent,
    			         dev_t devt, 
    				 void *drvdata, 
    				 const char *fmt, ...);
    

    Those who see a resemblance to device_create_drvdata() are right; all in-tree users were converted over to that interface, the old device_create() was removed, and device_create_drvdata() was renamed. For now, a macro makes calls to device_create_drvdata() do the right thing, but that macro will probably go away before the 2.6.27 final release.

  • User-space UIO drivers can now write a signed value to the /dev/uioX device to enable and disable interrupts.

  • Debugfs (finally) has a function for removing an entire directory tree:

        void debugfs_remove_recursive(struct dentry *dentry);
    

    As a result, code creating hierarchies in debugfs no longer need remember the dentry of every file they create.

  • The tracehook mechanism for defining static trace points (described in this article) has been merged, along with a number of trace points in the core kernel.

  • A new, lockless form of get_user_pages() has been added:

        int get_user_pages_fast(unsigned long start, int nr_pages, int write,
    			struct page **pages);
    

    Details of this interface can be found in this article, with the one note that early versions were called fast_gup() instead. (See also the related lockless page cache work, which was also merged).

  • The long-debated mmu-notifiers patch has been merged. The notifiers allow external memory management units (as may be seen in some graphics cards or in virtualized guests) to be told about decisions made by the core memory management code.

  • There is a new framework for debugging boot-time memory initialization; there's also "a few basic defensive measures" intended to prevent difficult-to-debug boot problems.

  • The new function:

        int object_is_on_stack(void *obj);
    

    returns a true value if the pointed-to object is on the current kernel stack.

  • There is a new macro for issuing warnings:

        WARN(condition, format, ...);
    

    It's much like WARN_ON() in that it will produce a full oops listing; the difference is the added printk()-style format string and arguments.

  • A new helper function:

        int flush_work(struct work_struct *work);
    

    waits for the specific workqueue job work to finish executing.

  • dma_mapping_error() and pci_dma_mapping_error() have new prototypes:

        int dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
        int pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr);
    

    In each case, they have gained a new argument specifying which device the mapping is being done for.

  • There are a couple of new radix tree functions:

        unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root, 
                                                 void ***results,
    					     unsigned long first_index, 
    					     unsigned int max_items);
        unsigned int radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, 
                                                     void ***results,
    						 unsigned long first_index,
    						 unsigned int max_items,
    						 unsigned int tag);
    

    They are useful for looking up multiple items in a single call.

  • Slab cache constructors no longer have a pointer to the cache itself as an argument; they now take a single void * pointer to the object itself.

  • The long list of Video4Linux2 ioctl() callbacks has been moved into its own structure (struct v4l2_ioctl_ops) which is pointed to by the ioctl_ops member of struct video_device.

2.6.26 (July 13, 2008)

  • At long last, support for the KGDB interactive debugger has been added to the x86 architecture. There is a DocBook document in the Documentation directory which provides an overview on how to use this new facility.

  • Page attribute table (PAT) support is also (again, at long last) available for the x86 architecture. PATs allow for fine-grained control of memory caching behavior with more flexibility than the older MTRR feature. See Documentation/x86/pat.txt for more information.

  • ioremap() on the x86 architecture will now always return an uncached mapping. Previously, it had taken a more relaxed approach, leaving the caching as the BIOS had set it up. The practical result was to almost always create uncached mappings, but with occasional exceptions. Drivers which depend on a cached mapping will now break; they will need to use ioremap_cache() instead.

  • The nopage() virtual memory area operation has been removed; all in-tree code is now using fault() instead.

  • Two new functions (inode_getsecid() and ipc_getsecid()), added to support security modules and the audit code, provide general access to security IDs associated with inodes and IPC objects. A number of superblock-related LSM callbacks now take a struct path pointer instead of struct nameidata. There is also a new set of hooks providing generic audit support in the security module framework.

  • The now-unused ieee80211 software MAC layer has been removed; all of the drivers which needed it have been converted to mac80211. Also removed are the sk98lin network driver (in favor of skge) and bcm43xx (replaced by b43 and b43legacy).

  • The generic semaphores patch has been merged. The semaphore code also has new down_killable() and down_timeout() functions.

  • The ata_port_operations structure used by libata drivers now supports a simple sort of operation inheritance, making it easier to write drivers which are "almost like" existing code, but with small differences.

  • A new function (ns_to_ktime()) converts a time value in nanoseconds to ktime_t.

  • The final users of struct class_device have been converted to use struct device instead. The class_device type has been removed.

  • The seq_file code now accepts a return value of SEQ_SKIP from the show() callback; that value causes any accumulated output from that call to be discarded.

  • The Video4Linux2 API now defines a set of controls for camera devices; they allow user space to work with parameters like exposure type, tilt and pan, focus, and more.

  • On the x86 architecture, there is a new configuration parameter which allows gcc to make its own decisions about the inlining of functions, even when functions are declared inline. In some cases, this option can reduce the size of the kernel's text segment by over 2%.

  • The legacy IDE layer has gone through a lot of internal changes which will break any remaining IDE drivers.

  • The SLUB allocator supports a new sysfs file (/sys/kernel/slab/name/order) which allows system administrators to change the size of page allocations used by the named slab.

  • A condition which triggers a warning from WARN_ON will now also taint the kernel.

  • The get_info() interface for /proc files has been removed. There is also a new function for creating /proc files:

        struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
    					    struct proc_dir_entry *parent,
    					    const struct file_operations *proc_fops,
    					    void *data);
    

    This version adds the data pointer, ensuring that it will be set in the resulting proc_dir_entry structure before user space can try to access it.

  • The object debugging infrastructure has been merged.

  • The klist type now has the usual-form macros for declaration and initialization: DEFINE_KLIST() and KLIST_INIT(). Two new functions (klist_add_after() and klist_add_before()) can be used to add entries to a klist in a specific position.

  • kmap_atomic_to_page() is no longer exported to modules.

  • There are some new generic functions for performing 64-bit integer division in the kernel:

        u64 div_u64(u64 dividend, u32 divisor);
        u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder);
        s64 div_s64(s64 dividend, s32 divisor)
        s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
    
    Unlike do_div(), these functions are explicit about whether signed or unsigned math is being done. The x86-specific div_long_long_rem() has been removed in favor of these new functions.

  • There is a new string function:

         bool sysfs_streq(const char *s1, const char *s2);
    

    It compares the two strings while ignoring an optional trailing newline.

  • The prototype for i2c probe() methods has changed:

         int (*probe)(struct i2c_client *client, 
                      const struct i2c_device_id *id);
    

    The new id argument supports i2c device name aliasing.

  • There is a new configuration (MODULE_FORCE_LOAD) which controls whether the loading of modules can be forced if the kernel thinks something is not right; it defaults to "no."



Reply via email to