Re: [LSF/MM TOPIC] block level event logging for storage media management

Song Liu Tue, 24 Jan 2017 15:19:39 -0800

> On Jan 24, 2017, at 12:18 PM, Oleg Drokin <gr...@linuxhacker.ru> wrote:
> 
> 
> On Jan 23, 2017, at 2:27 AM, Dan Williams wrote:
> 
>> [ adding Oleg ]
>> 
>> On Sun, Jan 22, 2017 at 10:00 PM, Song Liu <songliubrav...@fb.com> wrote:
>>> Hi Dan,
>>> 
>>> I think the the block level event log is more like log only system. When en 
>>> event
>>> happens,  it is not necessary to take immediate action. (I guess this is 
>>> different
>>> to bad block list?).
>>> 
>>> I would hope the event log to track more information. Some of these 
>>> individual
>>> event may not be very interesting, for example, soft error or latency 
>>> outliers.
>>> However, when we gather event log for a fleet of devices, these "soft event"
>>> may become valuable for health monitoring.
>> 
>> I'd be interested in this. It sounds like you're trying to fill a gap
>> between tracing and console log messages which I believe others have
>> encountered as well.
> 
> We have a somewhat similar problem problem in Lustre and I guess it's not 
> just Lustre.
> Currently there are all sorts of conditional debug code all over the place 
> that goes
> to the console and when you enable it for anything verbose, you quickly 
> overflow
> your dmesg buffer no matter the size, that might be mostly ok for local
> "block level" stuff, but once you become distributed, it start to be a mess
> and once you get to be super large it worsens even more since you need to
> somehow coordinate data from multiple nodes, ensure all of it is not lost and 
> still
> you don't end up using a lot of it since only a few nodes end up being useful.
> (I don't know how NFS people manage to debug complicated issues using just 
> this,
> could not be super easy).
> 
> Having some sort of a buffer of a (potentially very) large size that could be
> storing the data until it's needed, or eagerly polled by some daemon for 
> storage
> (helpful when you expect a lot of data that definitely won't fit in RAM).
> 
> Tracepoints have the buffer and the daemon, but creating new messages is
> very cumbersome, so converting every debug message into one does not look 
> very feasible.
> Also it's convenient to have "event masks" one want logged that I don't think 
> you could
> do with tracepoints.
> 
> I know you were talking about reporting events to the block layer, but other 
> than plain
> errors what would block layer do with them? just a convenient way to map 
> messages
> to a particular device? You don't plan to store it on some block device as 
> part
> of the block layer, right?
> 
> Implementing such a buffer all sorts of additional generic data might be
> collected automatically for all events as part of the buffer format like
> what cpu did emit it, time, stack usage information, current pid,
> backtrace (tracepoint-alike could be optional), actual source code location of
> the message, …
> 
> Having something like that being standard part of {dev,pr}_{dbg,warn,...} and 
> friends
> would be super awesome too, I imagine (adding Greg to CC for that).
>



Hi Oleg, 

Thanks for sharing these insights. 

We built an event logger that parses dmesg to get events. For similar reasons 
as you described 
above, it doesn't work well. And one of the biggest issue is poor "event mask" 
support. I am 
hoping get better event mask in newer implementation, for example, with kernel 
tracing filter, or 
implement customized logic in BPF. 

With a relatively mature infrastructure, we don't have much problem storing 
logs from the event
logger. Specifically, we use a daemon that collects events and send them to 
distributed storage
(HDFS+HIVE). It might be an overkill for smaller deployment. 

We do use information from similar (not exactly the one above) logs to make 
decision about 
device handling. For example, if a drive throws too much medium error in short 
period of time, 
we will kick it out of production. I think it is not necessary to include this 
in the block layer. 

Overall, I am hoping the kernel can generate accurate events, with flexible 
filter/mask support. 
There are different ways to store and consume these data. I guess most of these 
will be 
implemented in user space. Let's discuss potential use cases and requirements. 
These 
discussions should help us build the kernel part of the event log. 

Thanks,
Song

Re: [LSF/MM TOPIC] block level event logging for storage media management

Reply via email to