Hi, Jeffrey J. Kosowsky wrote on 2011-10-02 23:37:07 -0400 [Re: [BackupPC-users] Fairly large backuppc pool (4TB) moved with(TAB)backuppc_tarpccopy]: > > My method has the downside that you need to sort a huge file (but the > > 'sort' command handles huge files rather well). Jeffrey's method has the > > downside that you have an individual file per inode with typically > > probably only a few hundred bytes of content, which might end up > > occupying 4K each - depending on file system. Also, traversing the tree > > should take longer, because each file is opened and closed multiple times > > - once per link to the inode it describes. > > Actually, a single big file has a further advantage. It's rather fast to > > look for something (like all non-pooled files) with a Perl script. > > Traversing an "ipool" is bound to take a similar amount of time as > > traversing pool or cpool will. > > Holger, have you ever compared the time on actual data?
no. Actually, I don't consider my script as finished yet (and I haven't tried yours). I've been developing mine out of interest for the matter, and because I'd like to ultimately fix some things with one BackupPC installation I'm responsible for. The problem there is that we needed to start over due to file system problems once or twice, and I'd like to merge back the former backup history I've kept for that purpose (which is more a proof-of-concept sort of idea; we don't actually anticipate ever needing to *restore data* from the old backups, thus it doesn't have high priority). The two goals I haven't found time to find solutions for are: 1.) Merging of pools. Actually quite easy. When writing a pool file, use the logic from PoolWrite::write to match existing pool content or insert new pool files as appropriate. I'm saying "the logic of", because after each pool file I need to create the pc/ links, which means I can't wait for a link phase to create the actual pool link, because by then it might be necessary to replace the new file by a link to a pool file created in the mean time. For my special case, I also need to handle merging "backups" files and possibly renaming backups. 2.) Network capability. I'd like to generate one single data stream of some format (tar?), so I can pass that over arbitrary network transport (ssh, netcat, ...) for a remote copy operation. Currently, I use File::Copy, which, of course, limits it to local copies. Together, these goals suggest that having an "incremental mode" would make this a solution for offsite copies of BackupPC pools :-). I was almost surprised when I looked at my code yesterday to find out that local copies should actually work. I didn't remember finishing that part :). > Just one nit, I do allow for caching the inode pool so frequently > referenced pool files do not require the corresponding inode pool file > to be opened repeatedly. Well, ok, but how well can that work? You're limited to something like 1024 open files per process, I think. Can you do better than LRU? Depending on how you iterate over the pool, that would tend to give you a low cache hit rate (files tend to repeat in consecutive backups rather than within a single backup, I'd guess, and single backups will easily have more than 1024 files). > Also, I would think that with a large pool, that the file you > construct would take up a fair bit of memory Correct. I did try out the index generating phase (which is still available without copying per option switches, as is copying without regenerating the index file), and I got something like 2 GB of data. I'd say something like 100 Bytes per directory entry on the pool FS, as a *rough* estimate. What does 'du -s' of your ipool give you? > and that the O(n log n) to search it might take more time then referencing > a hierarchically structured pool, especially if the file is paged to disk. Also correct. But I don't *need* to search it. I construct it in a manner that the sort operation will put the lines ("records") in an order where I just have to linearly read the file and act on the lines one at a time. The only information I need to remember between lines is inode number and path of *one single* pool file (the last one encountered). Sorting the 2 GB file took a matter of minutes. > Of course, the above would depend on things like size of your memory, > efficiency of file system access, cpu speed vs. disk access. Still would be > curious though... That's what I like about the solution. The only step that is likely to depend on memory size is the sorting. Disk access is the key, as always when processing a BackupPC pool. I don't see a way around that. But I *can* easily keep the file off both source and destination pool disks if I want to. Default is below the destination pool TopDir, because that is the place I can most safely assume a large amount of free space. I could add logic to check, but I believe this should really best be manually specified. > Finally, did you ever post a working version of your script. No. I don't consider it tested, really, so I wouldn't want the integrity of someone's pool copy to depend on whether I had a good day or not :-). For the relevant question *in this thread*, that doesn't seem to be a problem. Collecting and sorting the data is straightforward enough, and if the results turn out to be incorrect, it will only waste a small amount of time (mine, probably :). Furthermore, it's not really commented, copyrighted, the code isn't cleaned up ... I can't even give you a synopsis or an option description without looking closely at the code right now :-). I'll send you a copy off-list, likewise to anyone else really interested, but I'm not prepared to say "entrust all your pool data to this script" yet, not even implicitly :-). Regards, Holger ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/