http://lwn.net/Articles/303270/Block layer: solid-state storage, timeouts, affinity, and moreSolid-state storage devicesThere are some enhancements aimed at improving the kernel's support of solid state storage devices. One of those, the discard API, has been covered here before. This API allows high-level block subsystem users (filesystems) to indicate that a particular range of blocks no longer contains useful data. That allows the low-level device to incorporate those blocks into its garbage collection scheme and to stop worrying about their contents when performing wear leveling. Since the initial LWN article, though, the API has changed a little. The way to issue a discard request is now: int blkdev_issue_discard(struct block_device *bdev, sector_t sector, unsigned nr_sects); The end_io() parameter seen in previous versions of the API is no longer present. There is no way for callers to know when the request completes, or, indeed, if the request completes at all. Since the caller is indicating a lack of interest in the given sectors, it really should not matter what the device does thereafter. There is a filesystem-level function for creating discard requests: static inline int sb_issue_discard(struct super_block *sb, sector_t block, unsigned nr_blocks); Here, the interface is expecting block numbers using the filesystem block size, rather than 512-byte sectors. User-space programs can issue discard requests with the new BLKDISCARD ioctl() call. Needless to say, such operations should be done with care; about the only logical user of this ioctl() would be mkfs programs. Block drivers which support discard requests will provide a suitable function to the block layer: typedef int (prepare_discard_fn) (struct request_queue *queue,
struct request *rq);
void blk_queue_set_discard(struct request_queue *q,
prepare_discard_fn *dfn);
In the absence of a "prepare discard" function, discard requests for the device will fail. The block layer has also added a flag by which drivers can indicate that a device is not rotating storage, and, thus, does not suffer from seek delays. By setting QUEUE_FLAG_NONROT (with queue_flag_set() or queue_flag_set_unlocked()), a driver tells the block layer that it is working with a solid state device. I/O schedulers can use that information to avoid plugging the queue - a useful technique for combining requests to rotating storage devices, but a useless operation when there is no seek penalty to avoid. Request affinityOn large, multiprocessor systems, there can be a performance benefit to ensuring that all processing of a block I/O request happens on the same CPU. In particular, data associated with a given request is most likely to be found in the cache of the CPU which originated that request, so it makes sense to perform the request postprocessing on that same CPU. With 2.6.28, sysfs entries for block devices will include an rq_affinity variable. If it is set to a non-zero value, CPU affinity will be turned on for that device. According to the patch changelog, turning this feature on can reduce system time by 20-40% on some benchmarks. Timeout handlingRobust device drivers typically have to be written to handle cases where devices fail to complete operations they have been instructed to do. In a few cases, higher-level code helps with this task; the networking layer, for example, can track outgoing packets and let a driver know when a transmit operation has taken too long. In most other drivers, though, it's up to the driver itself to notice when an operation seems to be taking too long. Like the network subsystem, the block layer manages queues of requested operations. As of 2.6.28 the block layer will, again like networking, have a mechanism for notifying drivers about request timeouts; that, in turn, will allow a bunch of timeout-related code to be removed from the lower layers. Timeout handling in the block layer can be more complex, though, and the associated API reflects that complexity. A block driver must register a function to handle timed-out requests: typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
void blk_queue_rq_timed_out(struct request_queue *q,
rq_timed_out_fn *fn);
The amount of time a request should be outstanding before timing out is set up with: void blk_queue_rq_timeout(struct request_queue *q,
unsigned int timeout);
The tracking of per-request timeouts is done within the block layer; the timer for any individual request is started when that request is dispatched to the driver by the I/O scheduler. Should a request fail to complete before the timeout period passes, the driver's timeout function will be called with a pointer to the languishing request. The driver then can do one of three things:
If things look bad, the driver may decide to abort any outstanding requests, reset the device, and start over. There are a couple of new functions which can help with this task: void blk_abort_request(struct request *req);
void blk_abort_queue(struct request_queue *q);
These functions will abort the given request, or all requests on the queue, as appropriate. Part of that process involves calling the driver's timeout handler for each aborted request. Other changes in briefSome other block-layer changes include:
It is, all told, one of the busier development cycles for the block layer in recent times. |
- [linuxkernelnewbies] Block layer: solid-state storage, timeouts... Peter Teoh
- [linuxkernelnewbies] Block layer: solid-state storage, tim... Peter Teoh
