On Fri, Feb 09, 2018 at 06:02:01PM +0100, Goffredo Baroncelli wrote: > On 02/08/2018 08:07 PM, Liu Bo wrote: > > On Thu, Feb 08, 2018 at 07:52:05PM +0100, Goffredo Baroncelli wrote: > >> On 02/06/2018 12:15 AM, Liu Bo wrote: > >> [...] > >>> One way to mitigate the data loss pain is to expose 'bad chunks', > >>> i.e. degraded chunks, to users, so that they can use 'btrfs balance' > >>> to relocate the whole chunk and get the full raid6 protection again > >>> (if the relocation works). > >> > >> [...] > >> [...] > >> > >>> + > >>> + /* read lock please */ > >>> + do { > >>> + seq = read_seqbegin(&fs_info->bc_lock); > >>> + list_for_each_entry(bc, &fs_info->bad_chunks, list) { > >>> + len += snprintf(buf + len, PAGE_SIZE - len, "%llu\n", > >>> + bc->chunk_offset); > >>> + /* chunk offset is u64 */ > >>> + if (len >= PAGE_SIZE) > >>> + break; > >>> + } > >>> + } while (read_seqretry(&fs_info->bc_lock, seq)); > >> > >> Using this interface, how many chunks can you list ? If I read the code > >> correctly, only up to fill a kernel page. > >> > >> If my math are correctly (PAGE_SIZE=4k, a u64 could require up to 19 > >> bytes) it is possible to list only few hundred of chunks (~200). Not more; > >> and the last one could be even listed incomplete. > >> > > > > That's true. > > > >> IIRC a chunk size is max 1GB; If you lost a 500GB of disks, the chunks to > >> list could be more than 200. > >> > >> My first suggestion is to limit the number of chunks to show to 200 (a > >> page should be big enough to contains all these chunks offset). If the > >> chunks number are greater, ends the list with a marker (something like > >> '[...]\n'). > >> This would solve the ambiguity about the fact that the list chunks are > >> complete or not. Anyway you cannot list all the chunks. > >> > > > > Good point, and I need to add one more field to each record to specify > > it's metadata or data. > > > > So what I have in my mind is that this kind of error is rare and > > reflects bad sectors on disk, but if there are really that many > > errors, I think we need to replace the disk without hesitation. > > What happens if you loose an entire disk ? If so, you fill "bad_sector" > > > > > >> However, my second suggestions is to ... change completely the interface. > >> What about adding a directory in sysfs, where each entry is a chunk ? > >> > >> Something like: > >> > >> /sys/fs/btrfs/<FS-UUID>/chunks/<chunks-offset>/type # > >> data/metadata/sys > >> /sys/fs/btrfs/<FS-UUID>/chunks/<chunks-offset>/profile # > >> dup/linear.... > >> /sys/fs/btrfs/<FS-UUID>/chunks/<chunks-offset>/size # size > >> /sys/fs/btrfs/<FS-UUID>/chunks/<chunks-offset>/devs # > >> chunks devs > >> > >> And so on. > >> > >> Checking "[...]<chunks-offset>/devs", it would be easy understand if the > >> chunk is in "degraded" mode or not. > > > > I'm afraid we cannot do that as it'll occupy too much memory for large > > filesystems given a typical chunk size is 1GB. > > > >> > >> However I have to admit that I don't know how feasible is iterate over a > >> sysfs directory which is a map of a kernel objects list. > >> > >> I think that if these interface would be good enough, we could get rid of > >> a lot of ioctl(TREE_SEARCH) from btrfs-progs. > >> > > > > TREE_SEARCH is faster than iterating sysfs (if we could), isn't it? > > Likely yes. However TREE_SEARCH is bad because it is hard linked to the > filesystem structure. I.e. if for some reason BTRFS change the on disk > layout, what happens for the old application which expect the previous disk > format ? And for the same reason, it is root-only (UID==0) > > Let me to give another idea: what about exporting these information via an > ioctl ? It could be extended to query all information about (all) the > chunks... >
It's all about choice, by using sysfs one can check it at the least cost. But as mentioned in the thread, I'll go and try a new ioctl way. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html