Hi, On Thu, May 07, 2026 at 01:32:35PM +0200, Hans wrote: > I would like to tell, why I asked.
It is usually a good idea to do this from the start, because it may be that your use case is solvable in other ways. This time though, the easier solutions have major deficiencies while the better solutions are really complex and/or expensive. TL;DR: If you can put in time but not much money, I recommend focusing on backups, config management, and use of both to get a good mean time to recovery. > So my solution would be, having drive images, restore them onto a new server > and of we go. Making a "golden image" of a server, taken after it is installed, booted into a live environment or some other way to read the disk without having it mounted, is a tried and tested decades old way of being able to quickly return a server to service. The hard part is making up to date images of servers while they are operational in order to capture application data. It's understandable why you desire to make such an image while the server is running. People in this thread have correctly explained to you why that's not possible. > But as I do not know, if this is possible at all (due to possible changes > while the drives are mounted), and could not find a way searching the web, I > allowed me, to ask here some other experts. Maybe they might know, if this > issue can be solved or not. > > But it appears, it is not, and so my question was fullly answered. There isn't an easy answer, but there are answers. This is a problem that everyone managing computers has. Some ignore it; the rest of us come up with solutions that can never be perfect but involve trade-offs we tolerate. Firstly I would say, step back and consider your backup strategy. Even if we were to suppose that it was possible to take an instant image of a running server, THAT WOULD STILL NOT BE A BACKUPS. So, independently of this issue, you need to have backups of application data. You need thagt because "server going on fire" is all or even most of the ways you lose data. Human error is more common. People delete stuff and mess it up. You as the operator get called upon to restore application data from a week or a month ago. So one way to look at this is, there has to be backups up in place, so work out how to restore your base server image and then replay the application data from backups onto it. The correct way to take consistent backups of application data will depend upon the application. For example, a database server like Postgres or MariaDB has commands you can issue to take consistent database-level locks and dump out the entire content of ther chosen tables. Maybe as the operator of the service you consider application-level backups to be the users' concerns. So then your problem space is just the base server image plus any custom configuration you did since. There's many ways to solve that. A popular one is configuration management: your configuration is stored like code and the config management software can quickly and easily apply it to a base server image, bringing the service live in a short period of time. It is a big investment of time to set up and requires ongoing discipline to make future changes in config management, not on the live servers. The issue of keeping the service running in the face of hardware failure is resilience. If you have enough resources then you can design something that doesn't have data consistency issues. For example, you can have a Ceph storage cluster redundant at every level and maybe even multi-site. If something breaks you fail over to different servers and all the data is still there. Very few of us can justify that sort of spend, so no more on that. Your options without application-specific facilities are basically different forms of filesystem snapshot. People already went through options like btrfs, zfs, or LVM underneath other filesystems. btrfs and zfs will take a (filesystem-level) consistent snapshot while an LVM snapshot will appear like a power loss event to filesystems on top of it. Modern filesystems are pretty robust against this and applications that care about data safety take steps to be consistent with power failure also: you lose some data that didn't get committed in time but nothing should be corrupted or half-committed. Of course, applications can be buggy, database cleitns can neglect to use transactions properly, etc. That's why you need backups. Instead of filesystem snapshots you could look at disk mirrors. For example, if you installed a server with two identical drives and made sure that everything was in mdadm RAID-1 then every so often you could pull out one drive and insert a blank one. THe array(s) should sync onto the blank drive and the drive you removed becomes your disaster recovery image: You could insert it in a new server and boot it and it would seem like a power loss event at the point you took the drive out. There are lots of variations on this. You might not like that the server has no redundancy while the mirror is broken, so maybe you run it as a 3-way mirror. Maybe you don't like that you have to physically do to each server and swap a drive. In that case you could use DRBD to have something like a RAID-1 but it's over the network to another server. As we pursue these ideas further it gets more and more expensive. Running applications inside virtual machines and containers helps to confine their data into something that is easier to manage from the outside. For small sites I think there is a lot to be recommended about configuration management, possibly with "golden images", to quickly get a base system up, and a way to restore application data from backups - since either you need that anyway or else it is purely the users' problem! No easy answers, sorry! > Hope, I made not too much noise! The question was welcome, it's just that it merely opens an area of discussion that can easily be the focus of a person's entire working life. 😀 Thanks, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting

