We worked on this problem in order to be able to offer a remote backup
of backuppc. The idea is : we use rsync to backup the main pool of file
in backuppc and we create a script for reconstructing the hardlink for
each pc. Then we just send these reconstruction files to the remote
location.
It is also efficient for backupping on a tape or on a removable media
backuppc.
Before using the attached script, we started by rsyncing the cpool, then
doing a rsync of the cpool and a pc pool directory with the adapted
options of rsync for keeping the hardlinks. However, in one case, we had
a customer that has a huge amount of (important) small files. It means
that using rsync leads to an explosion of the required memory for it to
find the hard links.
I then wrote some very small programs that do not rely at all on
backuppc or sophisticated rsync behaviour , but just on the file
structure used by backuppc.
The idea is the following :
* I parse the cpool directory,
* I create files from 00 to 99 (name 00.inode, 01.inode, ... 99.inode),
* for each file in each directory, I get the inode. I take the 2 last
digits (ZW) and I put the inode and the filename on a line of the file
in the file named ZW.inode
* these files are stored in /var/lib/backuppc/link/cpool
* I do the same operation for each pc directory
(/var/lib/backuppc/link/pc/the_name/00.inode, 01.inode, ...).
* I use the external sort command to sort all these XX.inode file,
* I then parse each pc/the_name/XY.inode file and parallely the
corresponding XY.inode file in link/cpool directory. I do not search the
files, I just parse them in parallel. As they were sorted there is no
search operation. I then easily find the cpool file corresponding to the
current file in the pc pool and can create a shell script that will
exactly recreate the pc pool hardlinks. Add to that the addition of
empty directory, and that's it.
I wrote these files a while ago. The fact is that it works (because it
is simple). It allows to just rsync the main cpool, and then to just
copy the script to reconstitute the pc pools in case of problesms. You
even do not need to reconstruct the remote pc pools, just keep the
script in the case you would need them. As you do not want to access
this remote backup all the time, it is no problem.
Obviously, the intelligent way to do it would just be to create the
shell scripts to reconstruct the pc pools while you are doing the
backup, it would avoid to reconstruct all the lists.
Finally to use the script, we have in a cron (\ to say that it is on the
same line) :
00 04 * * 6 root mv /var/lib/backuppc/link\
/var/lib/backuppc/link.old && rm -fr /var/lib/backuppc/link.old
01 04 * * 6 root pushd /var/lib/backuppc &&\
./create_restore_script.tcl -i -l -b /var/lib/backuppc && popd
This is for debian where the pool is at /var/lib/backuppc. Adapt it to
your distribution. The scripts are in production for many months for
some of our customers (we provide backuppc in our Linbox Rescue Server
product as well as hosting services and then remote backup services).
And you also need to put rsync in your cron.
The script is in tcl for changing.
I hope that this will be a nice christmas present for all backuppc users.
Best regards,
Arnaud LAPREVOTE
--
Arnaud LAPREVOTE Linbox/Free&ALter Soft
152, rue de Grigy - Technopôle Metz 2000 57070 METZ
tel : 03 87 50 87 90 - 06 11 36 15 30 fax : 03 87 75 19 26
Etablissement de Montreuil
32, bd Paul Vaillant Couturier 93100 MONTREUIL
tel : 01 48 57 62 33 fax : 01 48 57 90 75
E-mail : [EMAIL PROTECTED] Web: http://linbox.com
#!/usr/bin/tclsh
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# Copyright Arnaud LAPREVOTE and Linbox/Free&ALter Soft 2005
# TODO : optimize speed, maybe switch to C or another language
set BACKUPPC_DATA_ROOT /tftpboot/backuppc
proc create_inode_files { storage_dir current_fid_array current_name_array} {
upvar $current_fid_array fid_array
upvar $current_name_array name_array
# Does the storage directory exists ?
if { ![file isdirectory $storage_dir] } {
if { [catch { file mkdir $storage_dir } ] } {
puts "# Could not create the directory $storage_dir"
}
}
for { set i 0 } { $i < 100 } { incr i } {
set string_i [format "%02d" $i]
set current_inode_filename [file join $storage_dir "${string_i}.inode"]
puts "# Opening $current_inode_filename"
set name_array($string_i) "$current_inode_filename"
if { [catch {set fid_array($string_i) [open $current_inode_filename w]} error] } {
error "Problem opening $current_inode_filename" 100
}
}
}
proc get_inode_file_list { inode_dir } {
set result_list [list]
for { set i 0 } { $i < 100 } { incr i } {
set string_i [format "%02d" $i]
set current_inode_filename [file join $inode_dir "${string_i}.inode_ordered"]
lappend result_list $current_inode_filename
}
return $result_list
}
proc create_inode_lists { directory current_fid_array } {
upvar $current_fid_array fid_array
set dir_list [glob -nocomplain -type d -directory $directory *]
set file_list [glob -nocomplain -type [list f l] -directory $directory *]
foreach file $file_list {
file stat $file file_attrib
#set current_list [list $file_attrib(ino) $file]
set inode_end [format "%02d" [expr $file_attrib(ino) % 100]]
if { [catch {puts $fid_array($inode_end) "$file_attrib(ino) $file"} ] } {
puts "Error while writing to ${inodex_end}.inode"
}
}
foreach dir $dir_list {
create_inode_lists $dir fid_array
}
}
proc close_inode_files { current_fid_array } {
upvar $current_fid_array fid_array
for { set i 0 } { $i < 100 } { incr i } {
set string_i [format "%02d" $i]
close $fid_array($string_i)
}
}
proc sort_inode_files { current_name_array } {
upvar $current_name_array name_array
foreach {inode file} [array get name_array] {
set order_file "${file}_ordered"
exec sort -o $order_file -n $file
file delete $file
}
}
proc generate_inode_files { original_directory inode_file_directory } {
create_inode_files $inode_file_directory file_array name_array
create_inode_lists $original_directory file_array
close_inode_files file_array
sort_inode_files name_array
}
# If there are empty directories, these directories are not
# created and then saved in
proc get_empty_directory_list { root_directory } {
set file_number 0
set directory_list [glob -nocomplain -directory $root_directory -type d * .*]
set file_list [glob -nocomplain -directory $root_directory -type f * .*]
incr file_number [llength $file_list]
set empty_directory_list [list]
#puts "directory_list -> $directory_list"
#puts "file_list -> $file_list"
#puts "file_number -> $file_number"
# Directories in style . are they taken into account ? To be tested
foreach dir $directory_list {
# Ignoring . and .., else it loops endless
if { [file tail $dir] != "." && [file tail $dir] != ".." } {
#puts "LOOKING at $dir"
set current_directory_list [get_empty_directory_list $dir]
set current_directory_file_nber [lindex $current_directory_list 0]
set current_empty_directory [lindex $current_directory_list 1]
#puts " file_nber -> $current_directory_file_nber"
#puts " empty_dir -> $current_empty_directory"
if { [llength $current_empty_directory] != 0 } {
eval lappend empty_directory_list $current_empty_directory
}
incr file_number $current_directory_file_nber
}
}
if { $file_number == 0 && [llength $empty_directory_list] == 0 } {
lappend empty_directory_list $root_directory
}
#puts "RETURNING $file_number -- $empty_directory_list"
return [list $file_number $empty_directory_list]
}
proc create_empty_directory { empty_directory_list fid } {
foreach dir $empty_directory_list {
puts $fid "mkdirhier \"[clean_shell_name $dir]\""
}
}
proc compare_inode_files { first_pool second_pool script_name pc_link_dir } {
set first_pool_list [get_inode_file_list $first_pool]
set second_pool_list [get_inode_file_list $second_pool]
create_secure_ln_file [file join $second_pool .. secure_ln]
if { [catch {set script_fid [open $script_name w]}] } {
puts "# compare_inode_files: Problem could not open $script_name for writing"
exit
}
puts $script_fid "chmod u+rx secure_ln"
foreach first_pool_inode_file $first_pool_list second_pool_inode_file $second_pool_list {
# I will read the files line by line
# Get first_index of both line
# If the index are equal then I print something
# Else if first_index is smaller than the second I read the next line of the first_file
# Else I read the next line of the second file
# I read the next line
# puts "------------- first_pool_inode_file => $first_pool_inode_file"
if { [catch {set first_fid [open $first_pool_inode_file]}] } {
puts "# compare_inode_files: Problem could not read $first_pool_inode_file"
exit
}
if { [catch {set second_fid [open $second_pool_inode_file]}] } {
puts "# compare_inode_files: Problem could not read $second_pool_inode_file"
exit
}
set first_file_line ""
set first_file_line_car [gets $first_fid first_file_line]
set second_file_line ""
set second_file_line_car [gets $second_fid second_file_line]
#puts "READING first_file_line => $first_file_line"
#puts "READING second_file_line => $second_file_line"
while { $second_file_line_car != -1 } {
set first_inode [extract_inode $first_file_line]
set second_inode [extract_inode $second_file_line]
set first_name [extract_name $first_file_line]
set second_name [extract_name $second_file_line]
#puts "fi -> $first_inode , si -> $second_inode , fn -> $first_name , sn -> $second_name"
# I maybe in an empty case (nothing on the line)
# Then I go away
if { $first_inode == "" } {
set first_file_line ""
catch { set first_file_line_car [gets $first_fid first_file_line] }
# So I suppose that the file is orphaned
if { $second_inode != "" } {
puts $script_fid "# Orphan file (empty first file ?)"
set target_copy_file [file join [file dirname $second_pool_inode_file] [string trim $second_name "/"] ]
puts $script_fid "# DEBUG - copying orphan file $second_name to $target_copy_file"
# I should escape critical caracters such as space, "
puts $script_fid "mkdirhier \"[clean_shell_name [file dirname $second_name]]\""
puts $script_fid "cp \"[clean_shell_name $target_copy_file]\" \"[clean_shell_name $second_name]\""
file mkdir [file dirname $target_copy_file]
file delete $target_copy_file
file copy $second_name $target_copy_file
}
catch { set second_file_line_car [gets $second_fid second_file_line] }
} elseif { $second_inode == "" } {
set second_file_line ""
catch { set second_file_line_car [gets $second_fid second_file_line] }
} elseif { $first_inode == $second_inode } {
#puts "FIND EQUALITY : $first_inode $second_inode"
puts $script_fid "./secure_ln \"[clean_shell_name $first_name]\" \"[clean_shell_name $second_name]\""
#puts "./secure_ln $first_name $second_name"
# I also must go ahead, I think I must read a
# new line from the second file
set second_file_line ""
catch { set second_file_line_car [gets $second_fid second_file_line] }
} elseif { $first_inode < $second_inode } {
puts $script_fid "# DEBUG - $first_inode < $second_inode"
#puts "# DEBUG - $first_inode < $second_inode"
set first_file_line ""
catch { set first_file_line_car [gets $first_fid first_file_line] }
} else {
puts $script_fid "# DEBUG - $first_inode > $second_inode"
set target_copy_file [file join [file dirname $second_pool_inode_file] [string trim $second_name "/"] ]
puts $script_fid "# DEBUG - copying orphan file $second_name to $target_copy_file"
puts $script_fid "mkdirhier \"[clean_shell_name [file dirname $second_name]]\""
puts $script_fid "cp \"[clean_shell_name $target_copy_file]\" \"[clean_shell_name $second_name]\""
file mkdir [file dirname $target_copy_file]
file delete $target_copy_file
file copy $second_name $target_copy_file
#puts "# DEBUG - $first_inode > $second_inode"
set second_file_line ""
catch { set second_file_line_car [gets $second_fid second_file_line] }
}
}
close $first_fid
close $second_fid
}
# Now creating the list of empty directory
set empty_dir_result [get_empty_directory_list $pc_link_dir]
create_empty_directory [lindex $empty_dir_result 1] $script_fid
# At the end of the script, I change all the ownership of the directory
puts $script_fid "chown -R backuppc \"[clean_shell_name $pc_link_dir]\""
puts $script_fid "chmod -R 750 \"[clean_shell_name $pc_link_dir]\""
close $script_fid
}
proc extract_inode { line } {
set inode ""
regexp {^([^ ]*) } $line match inode
return $inode
}
proc extract_name { line } {
set name ""
regexp {^[^ ]* (.*)$} $line match name
return $name
}
proc clean_shell_name { name } {
regsub -all {"} $name {\\"} clean_name
return $name
}
proc create_secure_ln_file { filename } {
set fid [open $filename w]
puts $fid {#!/bin/bash
pc_file_dir=`dirname "$2"`
mkdirhier "$pc_file_dir"
ln "$1" "$2"
}
close $fid
}
set CREATE_INODE_FILE 0
set GENERATE_LINK_SCRIPT 0
set state get_options
foreach arg $argv {
switch -exact -- $state {
get_options {
switch -glob -- $arg {
-b* {
set state get_backuppc_dir
}
-i* {
set CREATE_INODE_FILE 1
}
-l* {
set GENERATE_LINK_SCRIPT 1
}
-h* {
puts "$argv0 : build files with inodes from a backuppc cpool and pc directory"
puts " to allow the effective backup of the cpool directory."
puts " -b backuppc_data_dir : allow to change the root of the backuppc data directory"
puts " -i : only create the inode files."
puts " -l : generate a shell script for recreating the pc dir. Beware only link files are create. The conf and log file of the directory are not taken into account"
puts " -h : this message"
puts "Normal usage should be :"
puts "$argv0 -i -l -b /tftpboot/backuppc"
}
}
}
get_backuppc_dir {
set BACKUPPC_DATA_ROOT $arg
set state get_options
}
}
}
# test
set pool_dir "${BACKUPPC_DATA_ROOT}/cpool"
#set pool_dir "/usr/local"
set link_dir "${BACKUPPC_DATA_ROOT}/pc"
set storage_pool_dir "${BACKUPPC_DATA_ROOT}/link/cpool"
set storage_link_dir "${BACKUPPC_DATA_ROOT}/link/pc"
if { $CREATE_INODE_FILE } {
generate_inode_files $pool_dir $storage_pool_dir
generate_inode_files $link_dir $storage_link_dir
}
if { $GENERATE_LINK_SCRIPT } {
# Now I need to be able to do the correspondance between the 2 files
# Choose a target file
set restore_script "${BACKUPPC_DATA_ROOT}/link/restore.sh"
compare_inode_files $storage_pool_dir $storage_link_dir $restore_script $link_dir
}