I have added checksumming to the current "chunked" implementation of
copyfile. If the source file on the Overlord and target file on the Minion
have the same checksum, the target is not overwritten.
My understanding is that copyfile originally calculated a checksum for
the source and target files and only overwrote the target if the checksums
differed. The "copyfile" method of the CopyFile object on the Minion side
handled this.
Currently copyfile always overwrites the target file. It appears that
when copyfile was modified to write files in chunks (avoiding out-of-memory
situations when copying large files) this feature vanished. Instead of
calling a single "copyfile" method there are two methods: "open" to truncate
(or create) the target file and "append" to keep adding data to it. Both of
these methods are always applied.
I have changed the "open" method so that it compares the checksums of
the source and target. It just returns zero and does not truncate the target
file if the checksums match. If they do not match it still truncates the
target and returns the file path, as before.
I have changed the "append" method to look up the return code from
"open" each time it is called. If the return code was zero or any number
instead of the file path, it does not append anything to the target file.
Note that "append" is still called repeatedly even on a Minion which is not
going to do any appending to that particular file. This is inefficient but it
works.
There should be a better way of adding the checksumming code to both
the Overlord and Minion copyfile.py files. I'll look into that next. Also,
it should now be relatively easy to re-implement the --force argument (skip
checksumming) and to create a new --backup argument (backup the target before
overwriting like the original copyfile method did).
Based on the experience of writing this code I would like to make a
feature suggestion. It would be useful if the "run" method in the "Overlord"
class could take an optional "mask" argument which could instruct it to skip
certain Minions. The mask could be a dictionary with minion names as the keys
and something like True/False as the data. Coincidentally this is almost
exactly the same structure which is returned by the "run" method in the
"Overlord" class itself. This feature would make it easy and efficient to
write functions which have multiple "run" calls which are dependent upon one
another.
--
Marcus
diff -rupN func/func//minion/modules/copyfile.py func.new/func//minion/modules/copyfile.py
--- func/func//minion/modules/copyfile.py 2010-11-25 20:16:50.000000000 -0500
+++ func.new/func//minion/modules/copyfile.py 2010-11-25 19:55:28.000000000 -0500
@@ -26,6 +26,7 @@ import time
import shutil
import func_module
+from func import utils as func_utils
class CopyFile(func_module.FuncModule):
@@ -40,28 +41,35 @@ class CopyFile(func_module.FuncModule):
return thissum.hexdigest()
def checksum(self, thing):
-
CHUNK=2**16
thissum = hashlib.new('sha1')
if os.path.exists(thing):
- fo = open(thing, 'r', CHUNK)
- chunk = fo.read
- while chunk:
+ fo = open(thing, 'r')
+ while True:
chunk = fo.read(CHUNK)
+ if not chunk:
+ break
thissum.update(chunk)
fo.close()
del fo
else:
- # assuming it's a string of some kind
+ # assuming it's a string of some kind
thissum.update(thing)
- return thissum.hexdigest()
+ hexdig = thissum.hexdigest()
+ return hexdig
- def open(self, filepath, mode=None, uid=-1, gid=-1):
+ def open(self, filepath, remote_sum, mode=None, uid=-1, gid=-1):
dirpath = os.path.dirname(filepath)
if not os.path.exists(dirpath):
os.makedirs(dirpath)
+ if os.path.exists(filepath):
+ local_sum = self.checksum(filepath)
+
+ if remote_sum == local_sum:
+ return 0
+
# Create empty file
try:
fo = open(filepath, 'w')
@@ -82,7 +90,14 @@ class CopyFile(func_module.FuncModule):
return filepath
- def append(self, filepath, filebuf):
+ def append(self, open_result, filebuf):
+ hostname = func_utils.get_hostname_by_route()
+ filepath = open_result[hostname]
+
+ # If the result of the open function was 0 (or -1), do not append, return the same value.
+ if type(filepath) == type(0):
+ return filepath
+
if not os.path.exists(filepath):
# file disaperead
return -1
diff -rupN func/func//overlord/modules/copyfile.py func.new/func//overlord/modules/copyfile.py
--- func/func//overlord/modules/copyfile.py 2010-11-25 20:16:50.000000000 -0500
+++ func.new/func//overlord/modules/copyfile.py 2010-11-25 19:42:51.000000000 -0500
@@ -4,9 +4,40 @@ import stat
import sys
import xmlrpclib
+try:
+ import hashlib
+except ImportError:
+ # Python-2.4.z ... gah! (or even 2.3!)
+ import sha
+ class hashlib:
+ @staticmethod
+ def new(algo):
+ if algo == 'sha1':
+ return sha.new()
+ raise ValueError, "Bad checksum type"
+
from func.overlord import overlord_module
class copyfile(overlord_module.BaseModule):
+ def checksum(self, thing):
+ CHUNK=2**16
+ thissum = hashlib.new('sha1')
+ if os.path.exists(thing):
+ fo = open(thing, 'r')
+ while True:
+ chunk = fo.read(CHUNK)
+ if not chunk:
+ break
+ thissum.update(chunk)
+ fo.close()
+ del fo
+ else:
+ # assuming it's a string of some kind
+ thissum.update(thing)
+
+ hexdig = thissum.hexdigest()
+ return hexdig
+
def send(self, localpath, remotepath, bufsize=60000):
try:
f = open(localpath, "r")
@@ -18,13 +49,14 @@ class copyfile(overlord_module.BaseModul
mode = stat.S_IMODE(st.st_mode)
uid = st.st_uid
gid = st.st_gid
+ local_sum = self.checksum(localpath)
- self.parent.run("copyfile", "open", [remotepath, mode, uid, gid])
+ open_result = self.parent.run("copyfile", "open", [remotepath, local_sum, mode, uid, gid])
while True:
data=f.read(bufsize)
if data:
- self.parent.run("copyfile", "append", [remotepath, xmlrpclib.Binary(data)])
+ self.parent.run("copyfile", "append", [open_result, xmlrpclib.Binary(data)])
else:
break
_______________________________________________
Func-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/func-list