On Wed, Jan 18, 2017 at 3:34 PM, Song Liu <[email protected]> wrote:
>
> Media health monitoring is very important for large scale distributed storage 
> systems.
> Traditionally, enterprise storage controllers maintain event logs for 
> attached storage
> devices. However, these controller managed logs do not scale well for large 
> scale
> distributed systems.
>
> While designing a more flexible and scalable event logging systems, we think 
> it is better
> to build the log in block layer. Block level event logging covers all major 
> storage media
> (SCSI, SATA, NVMe), and thus minimizes redundant work for different protocols.
>
> In this LSF/MM, we would like to discuss the following topics with the 
> community:
>     1. Mechanism for drivers report events (or errors) to block layer.
>        Basically, we will need a traceable function for the drivers to report 
> errors
>        (most likely right before calling end_request or bio_endio).
>
>     2. What mechanism (ftrace, BPF, etc.) is mostly preferred for the event 
> logging?
>
>     3. How should we categorize different events?
>        Currently, there are existing code that translates ATA error 
> (ata_to_sense_error)
>        and NVMe error (nvme_trans_status_code) to SCSI sense code. So we can
>        leverage SCSI Key Code Qualifier for event categorizations.
>
>     4. Detailed discussions on data structure for event logging.
>
> We will be able to show a prototype implementation during LSF/MM.

Hi Song,

How is this distinct from tracking a badblocks list?

I'm interested in this topic since we have both media error reporting
/ scrubbing for nvdimms as well "SMART" media health retrieval
commands.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to