Hi Dan, 

I think the the block level event log is more like log only system. When en 
event 
happens,  it is not necessary to take immediate action. (I guess this is 
different
to bad block list?). 

I would hope the event log to track more information. Some of these individual 
event may not be very interesting, for example, soft error or latency outliers. 
However, when we gather event log for a fleet of devices, these "soft event" 
may become valuable for health monitoring. 

Thanks,
Song


> On Jan 20, 2017, at 9:46 PM, Dan Williams <[email protected]> wrote:
> 
> On Wed, Jan 18, 2017 at 3:34 PM, Song Liu <[email protected]> wrote:
>> 
>> Media health monitoring is very important for large scale distributed 
>> storage systems.
>> Traditionally, enterprise storage controllers maintain event logs for 
>> attached storage
>> devices. However, these controller managed logs do not scale well for large 
>> scale
>> distributed systems.
>> 
>> While designing a more flexible and scalable event logging systems, we think 
>> it is better
>> to build the log in block layer. Block level event logging covers all major 
>> storage media
>> (SCSI, SATA, NVMe), and thus minimizes redundant work for different 
>> protocols.
>> 
>> In this LSF/MM, we would like to discuss the following topics with the 
>> community:
>>    1. Mechanism for drivers report events (or errors) to block layer.
>>       Basically, we will need a traceable function for the drivers to report 
>> errors
>>       (most likely right before calling end_request or bio_endio).
>> 
>>    2. What mechanism (ftrace, BPF, etc.) is mostly preferred for the event 
>> logging?
>> 
>>    3. How should we categorize different events?
>>       Currently, there are existing code that translates ATA error 
>> (ata_to_sense_error)
>>       and NVMe error (nvme_trans_status_code) to SCSI sense code. So we can
>>       leverage SCSI Key Code Qualifier for event categorizations.
>> 
>>    4. Detailed discussions on data structure for event logging.
>> 
>> We will be able to show a prototype implementation during LSF/MM.
> 
> Hi Song,
> 
> How is this distinct from tracking a badblocks list?
> 
> I'm interested in this topic since we have both media error reporting
> / scrubbing for nvdimms as well "SMART" media health retrieval
> commands.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to