[LSF/MM TOPIC] block level event logging for storage media management

Song Liu Wed, 18 Jan 2017 15:36:07 -0800

Media health monitoring is very important for large scale distributed storage 
systems. 
Traditionally, enterprise storage controllers maintain event logs for attached 
storage
devices. However, these controller managed logs do not scale well for large 
scale 
distributed systems.


While designing a more flexible and scalable event logging systems, we think it 
is better
to build the log in block layer. Block level event logging covers all major 
storage media
(SCSI, SATA, NVMe), and thus minimizes redundant work for different protocols. 

In this LSF/MM, we would like to discuss the following topics with the 
community:
    1. Mechanism for drivers report events (or errors) to block layer. 
       Basically, we will need a traceable function for the drivers to report 
errors 
       (most likely right before calling end_request or bio_endio).  
  
    2. What mechanism (ftrace, BPF, etc.) is mostly preferred for the event 
logging?

    3. How should we categorize different events?
       Currently, there are existing code that translates ATA error 
(ata_to_sense_error) 
       and NVMe error (nvme_trans_status_code) to SCSI sense code. So we can 
       leverage SCSI Key Code Qualifier for event categorizations. 

    4. Detailed discussions on data structure for event logging. 

We will be able to show a prototype implementation during LSF/MM. 

Thanks,
Song--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[LSF/MM TOPIC] block level event logging for storage media management

Reply via email to