On 13/02/17 13:32, Maxime de Roucy wrote: > Le lundi 06 février 2017 à 20:19 -0800, Pádraig Brady a écrit : >> On 03/02/17 04:58, [email protected] wrote: >>> I sometimes face some machine with big log file that take 90% of >>> partition space. >>> If those logs are importants I can't just remove it to free space >>> and have to archive it (gzip usually). >>> But the log file + it's archive doesn't fit in the partition so I >>> can't just `gzip my.log`. >>> On situation like these I usually do : >>> >>> $ gzip -c my.log | dd of=my.log conv=notrunc >>> … >>> X bytes (…) copied, … >>> $ truncate -s X my.log >>> >>> But when my.log is opened by another process it's not recommended ; >>> as I would ending up with my.log containing a zip and new logs (non >>> zipped) at the end. >>> >>> I end-up developing: https://github.com/tchernomax/dump-deallocate >>> A some utility that output and deallocate (fallocate punch-hole) a >>> file at the same time. >>> >>> I think it would be interesting to include this feature in dd so it >>> would be possible to do: >>> >>> $ dd if=my.log conv=punchhole | gzip > my.log.gzip >> >> That's not a robust operation as if gzip fails for any reason >> like disk full etc. some data will be lost. > > Indeed. I didn't think it was a problem as dd is a tool to use with > care. > I will add a warning in the man page. > >> So while punchhole functionality might be useful, >> I'm not so sure about coupling it just with read()? >> BTW there is already a punch_hole() function in copy.c >> that should be reused if we were to add this. > > I will use this function. > >> The reason we haven't added just punchhole functionality to dd, >> is because it's already available from fallocate(1). > > But fallocate can't output the data it erase. > >> It seems like a specialized tool to couple the following ops would be >> required: >> >> while (read(chunk)) >> compress >> write >> if (sync()) >> collapse_range(chunk) >> >> Note I used collapse_range rather than punch_hole there >> as that would probably simplify restarts for partial completions, >> as only the unprocessed data would be left in the file. > > It would be the safest but it means compressing the file in dd. > Which is not what this tool is for (AFAIK).
Right. I mentioned that would be the flow of a "specialized tool", like the mooted `inplace` command, where the "compress" functionality would be pluggable by specifying other commands. > Also I think using collapse_range isn't a good idea. It become > difficult to handle when the input file is write open by another > process. Hmm maybe. I'm not sure how offsets would be handled. For an O_APPEND log file there wouldn't be an issue. For random access there wouldn't be a worse issue compared to punching a hole in the data. Anyway thanks for the patch. I'm still slightly against merging as it's a guaranteed way to lose data if you ctrl-c the command or whatever. I'll let others weigh in at this point. cheers, Pádraig
