Python script for automatic synchronization based on inotify

René Mayrhofer Thu, 10 Mar 2011 06:18:59 -0800

Hi everybody,

So far, I was only an avid lurker on this list, but have not yet found time to 
contribute myself. With this email, I hope to change this and attach a preview 
of a Python script/daemon that I have been meaning to release for ages, but 
haven't gotten around to do so. In short, it does what Sparkleshare tries to 
do, but without any configuration GUI (just a dot-file for configuration and 
one Python script that runs the background thread). My home directory (or at 
least, the major part of it) has been under svn and nowadays git control for 
quite a while, and I have tried to make my life a little easier when it comes 
to synchronizing the clones between multiple machines for the non-source parts 
(i.e. normal documents). Early versions of this script have actually even been 
around before Sparkleshare had been announced, so I was pleased to see others 
go into the same direction. If not for lack of time, I would have tried to 
contribute some ideas to Sparkleshare as well. As I don't see that happening 
anytime soon (time-wise...), I at least want to push out what I have so far.



What does it do?
------------------------
Automatically keep DVCS repositories in sync whenever changes happen by 
automatically committing and pushing/pulling.

How does it do it?
------------------------
0. Set up desktop notifications (for these nice bubble-style popups when 
anything happens) and log into a Jabber/XMPP account specified in the config 
file.

1. Monitor a specific path for changes with inotify.
At the moment, only one path is supported and multiple skript instances have to 
be run for multiple disjoint paths. This path is assumed to be (part of) a 
repository. Currently tested with git, but should support most DVCS (the config 
file allows to specify the DVCS commands called when interacting with it).

2. When changes are detected, check them into the repository that is being 
monitored (or delete, or move, etc.).
It automatically ignores any patterns listed in .gitignore and the config file 
allows to exclude other directories (e.g. repositories within the main 
repository).

3. Wait for a configurable time. When nothing else changes in between, commit.

4. Wait a few seconds longer (again configurable) and, if nothing else is 
commited, initiate a push.

5. After the push has finished, send an XMPP message to self (that is, to all 
clients logged in with the same account) to notify other accounts of the push.

[At any time in between]. When receiving a proper XMPP message, pull from the 
repository.


Thoughts that should be considered at some point but have not yet been 
implemented:
------------------------
- The XMPP push message already contains a parameter, namely the repository the 
push went to. Add another parameter to specify the repository in which the 
change happened so that others can try to pull directly from there, in case it 
is quicker. The main use case for this optimization is my standard one: the 
laptop sitting next to the desktop and both of them syncing each other's home 
directories. Going via the main, hosted server is quite a bit more inefficient 
than pulling via 1GB/s LAN....

- Pulls and pushes can and should be optimized. At the moment, I take a 
conservative locking approach whenever a conflict may occur and performance is 
reasonable on my main work tree with ca. 16GB (cloned GIT repo), but not 
stellar. Specifically, actually implement the "optimized" pull lock strategy 
already described in the example config file.

- Implement another option for synchronization besides XMPP (idea: a simple 
broadcast reflector on a single TCP port that could even run on e.g. OpenWRT, 
or re-use whatever the Sparkleshare server does).

- Automatically adding some context to each commit message besides the 
automatic date/time would be useful for finding out why a change happened. 
Nepomuk anybody (just kidding, maybe, for now...)?

- Allow to specify commit messages via popups. When ignored, use default commit 
message.

Installation
------------------------
Simple. Copy the attached .autosync-example config file to ~/.autosync, change 
to your needs (paths including ignores and XMPP id/password), then run the 
autosync.py script. Note that it currently needs a slightly extended version of 
jabberbot.py (e.g. in the same directory from which autosync.py is executed) to 
allow reception of messages from its own XMPP Id. I would like to push these 
minimal changes upstream, but haven't done that so far.

Disclaimer
------------------------
This is my first Python program that is longer than 100 lines. Please be easy 
on me with the patches, complaints and "what did you think, doing it this way?" 
messages. I have tried to comment wherever I found it necessary for my own 
understanding, but this is neither the best structured nor the most elegant 
program I ever wrote. Any hints for improving it are greatly welcome, and 
interoperability patches to work with Sparkleshare even more so. In the future, 
the two projects should definitely interoperate, which will come done to 
implementing each other's notification mechanism. My autosync Python script 
could then be used wherever headless operation might be required and/or Mono is 
not installed.
I have tested it between three systems and, in this version, it works 
reasonably well. However, there does seem to be the occasional kink when 
editors go crazy on temporary file creation, renaming, deleting originals, etc. 
These might be races, but I don't know for certain yet. Additional test cases 
are more then welcome. This script should be fairly safe to try, considering 
that the worst it will do is add a few hundred commits to your DVCS repo and 
push them to the configured default remote. But, after all, what is the point 
in using a DVCS if you can't roll back any changes made by you or a buggy 
script (yes, I did have to do that a number of times while developing the 
manual inotify event coalescing to cooperate better with git add/remove/mv 
actions).


If there are any questions, don't hesitate to drop me a line. However, I might 
be unable to answer quickly, as I am just in the middle of a big teaching 
block. 

best regards,
Rene

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Version 0.3
# TODO:
# * find out why the Jabber msg-to-self doesn't work in some cases
# * determine if pulling directly from those repositories which caused the changes is quicker then from central
# * optimize pulls and pushes during startup
# * implement optimistic pull lock for better performance
# TODO future versions:
# - automatically add some context to commit messages (e.g. location, applications open at the same time, etc.)
# - allow to specify a commit/change message via traybar icon/popup message, maybe even in retrospect (rewriting history before pushing with a longer push delay)
#
# Usage:
#   ./autosync.py [config file, default is ~/.autosync]
#
# Background monitoring |path| and its subdirectories for modifications on
# files and automatically commits the changes to git. This script assumes
# that the configured directory is (a subdirectory) of a checked out git tree.
# A PID file is written to [pidfile] for killing the daemon later on.
# Optionally, an [ignores] file is read with one exclusion pattern per line
# and files matching any of the patterns are ignored. This will typically be
# the .gitignore file already existing the git tree.
#
# Note that for Jabber login, there probably needs to be a 
# _xmpp-client._tcp.<domain name of jabber account> SRV entry in DNS so that 
# the Python XMPP module can look up the server and port to use. Without such 
# an SRV entry, Jabber login may fail even if the account details are correct 
# and the server is reachable.
#
# Note, when there are errors 
#  ERROR:pyinotify:add_watch: cannot watch ...
# on startup, it will either be an invalid file or directory name which can 
# not be watched for changes, or the number of files a user may watch 
# concurrently using the kernel inotify interface has reached the set limit.
# In the latter case, the limit can be changed by modifying the sysctl variable
# fs.inotify.max_user_watches and increasing it to a sufficient value 
# (e.g. 500000).
#
# Dependencies:
#   Linux, Python 2.6, Pyinotify (better performance with version >= 0.9), JabberBot (>= 0.9)
# Recommended packages:
#   Pynotify for desktop notifications
#
# ============================================================================
# Copyright Rene Mayrhofer, 2010-2011
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2 or 3 of the License.
# ============================================================================

from __future__ import with_statement

import warnings, sys, signal, os, time, subprocess, threading, fnmatch, pyinotify, ConfigParser, logging

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=DeprecationWarning)
    import jabberbot, xmpp

botcmd = jabberbot.botcmd

# some global variables, will be initialized in main
desktopnotifykde = False
desktopnotifygnome = False
knotify = None
notifier = None
bot = None

def printmsg(title, msg):
    try:
        if desktopnotifygnome:
            n = pynotify.Notification(title, msg)
            n.show()
        elif desktopnotifykde:
            knotify.event('info', 'kde', [], title, msg, [], [], 0, dbus_interface="org.kde.KNotify")
        else:
            print title + ': ' + msg
    except:
        print title + ': ' + msg


# this helper class has been shamelessly copied from http://socialwire.ca/2010/01/python-resettable-timer-example/
class ResettableTimer(threading.Thread):
    """
    The ResettableTimer class is a timer whose counting loop can be reset
    arbitrarily. Its duration is configurable. Commands can be specified
    for both expiration and update. Its update resolution can also be
    specified. Resettable timer keeps counting until the "run" method
    is explicitly killed with the "kill" method.
    """
    def __init__(self, maxtime, expire, inc=None, update=None, arg=None):
        """
        @param maxtime: time in seconds before expiration after resetting
                        in seconds
        @param expire: function called when timer expires
        @param inc: amount by which timer increments before
                    updating in seconds, default is maxtime/2
        @param update: function called when timer updates
        @param arg: arbitrary argument that will be passed to function expire when timer expires 
        """
        self.maxtime = maxtime
        self.expire = expire
        if inc:
            self.inc = inc
        else:
            self.inc = maxtime / 2
        if update:
            self.update = update
        else:
            self.update = lambda c : None

        self.arg = arg
        self.counter = 0
        self.active = True
        self.stop = False
        threading.Thread.__init__(self)
        self.setDaemon(True)
        
    def set_counter(self, t):
        """
        Set self.counter to t.

        @param t: new counter value
        """
        self.counter = t
        
    def deactivate(self):
        """
        Set self.active to False.
        """
        self.active = False
        
    def kill(self):
        """
        Will stop the counting loop before next update.
        """
        self.stop = True
        
    def reset(self):
        """
        Fully rewinds the timer and makes the timer active, such that
        the expire and update commands will be called when appropriate.
        """
        self.counter = 0
        self.active = True

    def run(self):
        """
        Run the timer loop.
        """
        while True:
            self.counter = 0
            while self.counter < self.maxtime:
                self.counter += self.inc
                time.sleep(self.inc)
                if self.stop:
                    return
                if self.active:
                    self.update(self.counter)
            if self.active:
                self.active = False
                self.expire(self.arg)


class AutosyncJabberBot(jabberbot.JabberBot):
    def __init__(self, username, password, res=None, debug=False, ignoreownmsg=True):
        self.__running = False
        jabberbot.JabberBot.__init__(self, username, password, res, debug, ignoreownmsg)

    def log( self, s):
        logging.debug('AutosyncJabberbot:' + s)

    def _process_thread(self):
        print 'Background Jabber bot thread starting'
        while self.__running:
            try:
                self.conn.Process(1)
                self.idle_proc()
            except IOError:
                print 'Received IOError while trying to handle incoming messages, trying to reconnect now'
                self.connect()

    def start_serving(self):
        self.connect()
        if self.conn:
            self.log('bot connected. serving forever.')
        else:
            self.log('could not connect to server - aborting.')
            return

        self.__running = True
        self.__thread = threading.Thread(target=self._process_thread)
        self.__thread.start()

        # this is a hack to get other bots to add this one to their "seen" lists
        # TODO: still doesn't work, figure out how to use JabberBot to get rid of
        # 'AutosyncJabberBot : Ignoring message from unseen guest: rene-s...@doc.to/AutosyncJabberBot on iss'
        self.conn.send(xmpp.Presence(to=username))

    def stop_serving(self):
        self.__running = False
        self.__thread.join()
	
        # override the send method so that connection errors can be handled by trying to reconnect
        def send(self, user, text, in_reply_to=None, message_type='chat'):
            try:
                jabberbot.JabberBot.send(self, user, text, in_reply_to, message_type)
            except IOError:
                print 'Received IOError while trying to send message, trying to reconnect now'
                self.stop_serving()
                self.start_serving()
  
    @botcmd
    def whoami(self, mess, args):
        """Tells you your username"""
        return 'You are %s, I am %s/%s' % (mess.getFrom(), self.jid, self.res)

    @botcmd
    def ping(self, mess, args):
        print 'Received ping command over Jabber channel'
        return 'pong'
        
    @botcmd
    def pushed(self, mess, args):
        print 'Received pushed command over Jabber channel with args %s from %s' % (args, mess.getFrom())
        if mess.getFrom() == str(self.jid) + '/' + self.res:
            print 'Ignoring own pushed message looped back by server'
        else:
            print 'TRYING TO PULL FROM %s' % args
            with lock:
                handler.protected_pull()


class FileChangeHandler(pyinotify.ProcessEvent):
    def my_init(self, cwd, ignored):
        self.cwd = cwd
        self.ignored = ignored
        # singleton timer for delayed execution of push 
        self._push_timer = None
        # When set to true, then all events will be ignored.
        # This is used to temporarily disable file event handling when a local
        # pull operation is active.
        self._ignore_events = False
        # This is a dictionary of all events that occurred within _coalesce_time seconds.
        # Elements in the sets are tuples of FIFO lists of event types which were delivered
        # for the respective file path and timers for handling the file, indexed by the 
        # respective file path.
        self._file_events = dict()
        
    def _exec_cmd(self, commands, parms = None):
        for command in commands.split('\n'):
            cmdarray = command.split(' ')
            if parms:
                i = 0
                j = 0
                while i < len(cmdarray):
                    if cmdarray[i] == '%s':
                        logging.debug('Substituting cmd part %s with %s' % (cmdarray[i], parms[j]))
                        cmdarray[i] = parms[j]
                        j=j+1
                    i=i+1 
            subprocess.call(cmdarray, cwd=self.cwd)

    def _post_action_steps(self):
        with lock:
            # the status command should return 0 when nothing has changed
            retcode = subprocess.call(cmd_status, cwd=self.cwd, shell=True)
            if retcode != 0:
                self._exec_cmd(cmd_commit)
	  
        if retcode != 0:
            # reset the timer and start in case it is not yet running (start should be idempotent if it already is)
            # this has the effect that, when another change is committed within the timer period (readfrequency seconds),
            # then these changes will be pushed in one go
            if self._push_timer and self._push_timer.is_alive():
                print 'Resetting already active push timer to new timeout of %s seconds until push would occur' % readfrequency
                self._push_timer.reset()
            else:
                print 'Starting push timer with %s seconds until push would occur (if no other changes happen in between)' % readfrequency
                self._push_timer = ResettableTimer(maxtime=readfrequency, expire=self._real_push, inc=1, update=self.timer_tick)
                self._push_timer.start()
        else:
            print 'Git reported that there is nothing to commit, not touching commit timer'

    def _queue_action(self, event, action, parms, act_on_dirs=False):
        curpath = event.pathname
        if self._ignore_events:
            print 'Ignoring event %s to %s, it is most probably caused by a remote change being currently pulled' % (event.maskname, event.pathname)
            return
        if event.dir and not act_on_dirs:
            print 'Ignoring change to directory ' + curpath
            return
        if any(fnmatch.fnmatch(curpath, pattern) for pattern in self.ignored):
            print 'Ignoring change to file %s because it matches the ignored patterns from .gitignore' % curpath
            return

        # remember the event for this file, but don't act on it immediately
        # this allows e.g. a file that has just been removed and re-created
        # immediately afterwards (as many editors do) to be recorded just as
        # being modified
        with lock:
            # each entry in the dict is a tuple of the list of events and a timer
            if not self._file_events.has_key(curpath):
                self._file_events[curpath] = [list(), None]
            # and each entry in the list is a tuple of event name and associated action
            self._file_events[curpath][0].append((event.maskname, action))
            if self._file_events[curpath][1] and self._file_events[curpath][1].is_alive():
                print 'Resetting already active coalesce timer to new timeout of %s seconds until coalescing events for file %s would occur' % (coalesce_seconds, curpath)
                self._file_events[curpath][1].reset()
            else:
                print 'Starting coalesce timer with %s seconds until coalescing events for file %s would occur (if no other changes happen in between)' % (coalesce_seconds, curpath)
                self._file_events[curpath][1] = ResettableTimer(maxtime=coalesce_seconds, expire=self._filter_and_handle_actions, inc=1, arg=[curpath, parms])
                self._file_events[curpath][1].start()
            
    def _filter_and_handle_actions(self, args):
        curpath = args[0]
        parms = args[1]
            
        print 'Coalesce event triggered for file ' + curpath
        with lock:
            print 'Considering file %s, which has the following events recorded:' % curpath
            events, timer = self._file_events[curpath]
            lastevent = None
            lastaction = None
            for eventtype, action in events:
                print '   Event type=%s, action=%s' % (eventtype, action)
                
                if not lastevent:
                    lastevent = eventtype
                    lastaction = action
                
                # prio 1: add
                # prio 2: move
                # prio 3: modify
                # prio 4: rm
                # special case: rm then add --> modify
                if lastevent == 'IN_DELETE' and eventtype == 'IN_CREATE':
                    lastevent = 'IN_MODIFY'
                    lastaction = cmd_modify
                    break
                
                # priority ordering 
                if lastevent == 'IN_MODIFY' and eventtype == 'IN_CREATE':
                    lastevent = eventtype
                    lastaction = action
                if lastevent == 'IN_DELETE' and eventtype == 'IN_MODIFY':
                    lastevent = eventtype
                    lastaction = action

            print 'Final action for file %s: type=%s, action=%s' % (curpath, lastevent, lastaction)

            # and clear again for next events coalescing
            del self._file_events[curpath]
            
            printmsg('Local change', 'Committing changes in ' + curpath + " : " + lastaction)
            print 'Committing changes in ' + curpath + " : " + lastaction
    
            self._exec_cmd(lastaction, parms)
            self._post_action_steps()
            

    def process_IN_DELETE(self, event):
        # sanity check - don't remove file if it still exists in the file system!
        if os.path.exists(event.pathname):
            print 'Ignoring file delete event on %s, as it still exists - it was probably immediately re-created by the application' % event.pathname
            return
         
        self._queue_action(event, cmd_rm, [event.pathname])

    def process_IN_CREATE(self, event):
        self._queue_action(event, cmd_add, [event.pathname])

    def process_IN_MODIFY(self, event):
        self._queue_action(event, cmd_modify, [event.pathname])

    def process_IN_CLOSE_WRITE(self, event):
        self._queue_action(event, cmd_modify, [event.pathname])

    def process_IN_ATTRIB(self, event):
        self._queue_action(event, cmd_modify, [event.pathname])

    def process_IN_MOVED_TO(self, event):
        try:
            if event.src_pathname:
                print 'Detected moved file from %s to %s' % (event.src_pathname, event.pathname)
                self._queue_action(event, cmd_move, [event.src_pathname, event.pathname], act_on_dirs=True)
            else:
                print 'Moved file to %s, but unknown source, will simply add new file' % event.pathname
                self._queue_action(event, cmd_add, [event.pathname], act_on_dirs=True)
        except AttributeError:
            # we don't even have the attribute in the event, so also add
            print 'Moved file to %s, but unknown source, will simply add new file' % event.pathname
            self._queue_action(event, cmd_add, [event.pathname], act_on_dirs=True)
	    
    def timer_tick(self, counter):
        logging.debug('Tick %d / %d' % (counter, self._push_timer.maxtime))
	
    def startup(self):
        with lock:
            print 'Running startup command to check for local changes now: ' + cmd_startup
            self._exec_cmd(cmd_startup)
            self._post_action_steps()
	    
    def _real_push(self, arg):
        printmsg('Pushing changes', 'Pushing last local changes to remote repository')
        print 'Pushing last local changes to remote repository'
        with lock:
            # TODO: check if we actually need a pull or a check-for-pull here 
            # or if all race conditions were already ruled out
            # if we need a check-for-pull, then something like 
            #    git fetch --dry-run | grep "Unpacking objects:
            # might help
            #self.protected_pull()
            self._exec_cmd(cmd_push)
	
        # and try to notify other instances
        if bot:
            proc = subprocess.Popen(cmd_remoteurl.split(' '), stdout=subprocess.PIPE)
            (remoteurl, errors) = proc.communicate()
            for sendto in [username, alsonotify]:
                if sendto:
                    bot.send(sendto, 'pushed %s' % remoteurl)

    def protected_pull(self):
        printmsg('Pulling changes', 'Pulling changes from remote repository')
        print 'Pulling changes from remote repository'
        # need to handle file change notification while applying remote
        # changes caused by the pull: either conservative (ignore all
        # file notifications while the pull is running) or optimized (replay the
        # file changes that were seen during the pull after it has finished)

        if conservative_pull_lock:
            # conservative strategy: ignore all events from now on
            self._ignore_events = True
	
        with lock:
            handler._exec_cmd(cmd_pull)
	
        if conservative_pull_lock:
            # pull done, now start handling events again
            self._ignore_events = False
            # and handle those local changes that might have happened while the
            # pull ran and we weren't listening by simply doing the startup 
            # sequence again
            self.startup()


def signal_handler(signal, frame):
    print 'You pressed Ctrl+C, exiting gracefully!'
    if notifier:
        notifier.stop()
    if bot:
        bot.stop_serving()
    sys.exit(0)


if __name__ == '__main__':
    config = ConfigParser.RawConfigParser()
    defaultcfgpath = os.path.expanduser('~/.autosync')
    if len(sys.argv) >= 2:
        config.read([sys.argv[1], defaultcfgpath])
    else:
        config.read(defaultcfgpath)

    pathstr = config.get('autosync', 'path')
    path = os.path.normpath(os.path.expanduser(pathstr))
    if os.path.isdir(path):
        print 'Watching path ' + path
    else:
        print 'Error: path ' + path + ' (expanded from ' + pathstr + ') does not exist'
        os.exit(100)
    
    pidfile = config.get('autosync', 'pidfile')
    ignorepaths = config.get('autosync', 'ignorepath')
    readfrequency = int(config.get('autosync', 'readfrequency'))
    coalesce_seconds = 2
    syncmethod = config.get('autosync', 'syncmethod')
    pulllock = config.get('autosync', 'pulllock')
    if pulllock == 'conservative':
        conservative_pull_lock = True
    elif pulllock == 'optimized':
        conservative_pull_lock = False
        print 'Error: optimized pull strategy not fully implemented yet (event replay queue missing)'
        os.exit(101)
    else:
        print 'Error: unknown pull lock strategy %s, please use either conservative or optimized' % pulllock
        os.exit(100)
    
    # Read required DCVS commands
    cmd_status = config.get('dcvs', 'statuscmd')
    cmd_startup = config.get('dcvs', 'startupcmd')
    cmd_commit = config.get('dcvs', 'commitcmd')
    cmd_push = config.get('dcvs', 'pushcmd')
    cmd_pull = config.get('dcvs', 'pullcmd')
    cmd_add = config.get('dcvs', 'addcmd')
    cmd_rm = config.get('dcvs', 'rmcmd')
    cmd_modify = config.get('dcvs', 'modifycmd')
    cmd_move = config.get('dcvs', 'movecmd')
    cmd_remoteurl = config.get('dcvs', 'remoteurlcmd')
    
    # TODO: this is currently git-specific, should be configurable
    ignorefile = os.path.join(path, '.gitignore')
    # load the patterns and match them internally with fnmatch
    if os.path.exists(ignorefile):
        f = open(ignorefile, 'r')
        ignorefilepatterns = [pat.strip() for pat in f.readlines()]
        f.close()
    else:
        ignoefilepatterns = []
    # (unfortunately, can't use pyinotify.ExcludeFilter, because this expects regexes (which .gitignore doesn't support))
    print 'Ignoring files matching any of the patterns ' + ' '.join(ignorefilepatterns)

    # but we can use the ignore filter with our own pathname excludes
    # However, need to prepend the watch path name, as the excludes need to be 
    # absolute path names.
    ignoreabsolutepaths = [os.path.normpath(path + os.sep + ignorepath) for ignorepath in ignorepaths.split()]
    print 'Adding list to inotify exclude filter: '
    print ignoreabsolutepaths
    excl = pyinotify.ExcludeFilter(ignoreabsolutepaths)

    signal.signal(signal.SIGINT, signal_handler)

    # try to set up desktop notification, first for KDE4, then for Gnome
    # the signature is not correct, so rely on pynotify only at the moment
    #try:
	#import dbus
	#knotify = dbus.SessionBus().get_object("org.kde.knotify", "/Notify")
	#knotify.event("warning", "autosync application", [],
	    #'KDE4 notification initialized', 'Initialized KDE4 desktop notification via DBUS', 
	    #[], [], 0, dbus_interface='org.kde.KNotify')
	#desktopnotifykde = True
    #except:
	#print 'KDE4 KNotify does not seem to run or dbus is not installed'
    
    try:
        import pynotify
        if pynotify.init('autosync application'):
            print 'pynotify initialized successfully, will use desktop notifications'
            desktopnotifygnome = True
        else:
            print 'there was a problem initializing the pynotify module'
    except:
        print 'pynotify does not seem to be installed'
	
    username = config.get('xmpp', 'username')
    password = config.get('xmpp', 'password')
    try:
        alsonotify = config.get('xmpp', 'alsonotify')
    except:
        alsonotify = None
    res = 'AutosyncJabberBot on %s' % os.uname()[1]
    try:
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore",category=DeprecationWarning)
            bot = AutosyncJabberBot(username, password, res=res, debug=False, ignoreownmsg=False)
            bot.start_serving()
        bot.send(username, 'login %s' % res)
        if alsonotify:
            bot.send(alsonotify, 'Autosync logged in with XMPP id %s' % username)
        printmsg('Autosync Jabber login successful', 'Successfully logged into Jabber account ' + username)
    except Exception as inst:
        print type(inst)
        print inst
        printmsg('Autosync Jabber login failed', 'Could not login to Jabber account ' + username + '. Will not announce pushes to other running autosync instances.')	

    wm = pyinotify.WatchManager()
    handler = FileChangeHandler(cwd=path, ignored=ignorefilepatterns)
    # TODO: frequency doesn't work....
    notifier = pyinotify.ThreadedNotifier(wm, handler, read_freq=readfrequency)
    #notifier = pyinotify.ThreadedNotifier(wm, handler)
    # coalescing events needs pyinotify >= 0.9, so make this optional
    try:
        notifier.coalesce_events()
    except AttributeError as inst:
        print 'Can not coalesce events, pyinotify does not seem to support it (maybe too old): %s' % inst
    mask = pyinotify.IN_DELETE | pyinotify.IN_CREATE | pyinotify.IN_CLOSE_WRITE | pyinotify.IN_ATTRIB | pyinotify.IN_MOVED_FROM | pyinotify.IN_MOVED_TO | pyinotify.IN_DONT_FOLLOW | pyinotify.IN_ONLYDIR
    try:
        print 'Adding recursive, auto-adding watch for path %s with event mask %d' % (path, mask)
        wd = wm.add_watch(path, mask, rec=True, auto_add=True, quiet=False, exclude_filter=excl)
        if wd <= 0:
            print 'Unable to add watch for path %s - this will not work' % path
    except pyinotify.WatchManagerError, err:
        print err, err.wmd

    printmsg('autosync starting', 'Initialization of local file notifications and Jabber login done, starting main loop')
    
    # this is a central lock for guarding repository operations
    lock = threading.RLock()

    print '==> Start monitoring %s (type c^c to exit)' % path
    # TODO: daemonize
    # notifier.loop(daemonize=True, pid_file=pidfile, force_kill=True)
    notifier.start()
    print '=== Executing startup synchronizaion'
    handler.protected_pull()
    if not conservative_pull_lock:
        # only need to run the startup command here when not using conservative pull locking - otherwise the protected_pull will already do it
        handler.startup()
    
    print '----------------------------------------------------------------'

    while True:
        time.sleep(10)

#!/usr/bin/python
# -*- coding: utf-8 -*-

# JabberBot: A simple jabber/xmpp bot framework
# Copyright (c) 2007-2009 Thomas Perl <thpinfo.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#


import sys

try:
    import xmpp
except ImportError:
    print >>sys.stderr, 'You need to install xmpppy from http://xmpppy.sf.net/.'
    sys.exit(-1)
import inspect
import traceback

"""A simple jabber/xmpp bot framework"""

__author__ = 'Thomas Perl <t...@thpinfo.com>'
__version__ = '0.9'
__website__ = 'http://thpinfo.com/2007/python-jabberbot/'
__license__ = 'GPLv3 or later'

def botcmd(*args, **kwargs):
    """Decorator for bot command functions"""

    def decorate(func, hidden=False):
        setattr(func, '_jabberbot_command', True)
        setattr(func, '_jabberbot_hidden', hidden)
        return func

    if len(args):
        return decorate(args[0], **kwargs)
    else:
        return lambda func: decorate(func, **kwargs)


class JabberBot(object):
    AVAILABLE, AWAY, CHAT, DND, XA, OFFLINE = None, 'away', 'chat', 'dnd', 'xa', 'unavailable'

    MSG_AUTHORIZE_ME = 'Hey there. You are not yet on my roster. Authorize my request and I will do the same.'
    MSG_NOT_AUTHORIZED = 'You did not authorize my subscription request. Access denied.'

    def __init__(self, username, password, res=None, debug=False, ignoreownmsg=True):
        """Initializes the jabber bot and sets up commands."""
        self.__debug = debug
        self.__username = username
        self.__password = password
        self.jid = xmpp.JID(self.__username)
        self.res = (res or self.__class__.__name__)
        self.conn = None
        self. ignoreownmsg = ignoreownmsg
        self.__finished = False
        self.__show = None
        self.__status = None
        self.__seen = {}
        self.__threads = {}

        self.commands = {}
        for name, value in inspect.getmembers(self):
            if inspect.ismethod(value) and getattr(value, '_jabberbot_command', False):
                self.debug('Registered command: %s' % name)
                self.commands[name] = value

################################

    def _send_status(self):
        self.conn.send(xmpp.dispatcher.Presence(show=self.__show, status=self.__status))

    def __set_status(self, value):
        if self.__status != value:
            self.__status = value
            self._send_status()

    def __get_status(self):
        return self.__status

    status_message = property(fget=__get_status, fset=__set_status)

    def __set_show(self, value):
        if self.__show != value:
            self.__show = value
            self._send_status()

    def __get_show(self):
        return self.__show

    status_type = property(fget=__get_show, fset=__set_show)

################################

    def debug(self, s):
        if self.__debug: self.log(s)

    def log( self, s):
        """Logging facility, can be overridden in subclasses to log to file, etc.."""
        print self.__class__.__name__, ':', s

    def connect( self):
        if not self.conn:
            if self.__debug:
                conn = xmpp.Client(self.jid.getDomain())
            else:
                conn = xmpp.Client(self.jid.getDomain(), debug = [])

            conres = conn.connect()
            if not conres:
                self.log( 'unable to connect to server %s.' % self.jid.getDomain())
                return None
            if conres<>'tls':
                self.log("Warning: unable to establish secure connection - TLS failed!")

            authres = conn.auth(self.jid.getNode(), self.__password, self.res)
            if not authres:
                self.log('unable to authorize with server.')
                return None
            if authres<>'sasl':
                self.log("Warning: unable to perform SASL auth os %s. Old authentication method used!" % self.jid.getDomain())

            conn.RegisterHandler('message', self.callback_message)
            conn.RegisterHandler('presence', self.callback_presence)
            conn.sendInitPresence()
            self.conn = conn
            self.roster = self.conn.Roster.getRoster()
            self.log('*** roster ***')
            for contact in self.roster.getItems():
                self.log('  ' + str(contact))
            self.log('*** roster ***')

        return self.conn

    def join_room(self, room):
        """Join the specified multi-user chat room"""
        my_room_JID = "%s/%s" % (room,self.__username)
        self.connect().send(xmpp.Presence(to=my_room_JID))

    def quit( self):
        """Stop serving messages and exit.

        I find it is handy for development to run the
        jabberbot in a 'while true' loop in the shell, so
        whenever I make a code change to the bot, I send
        the 'reload' command, which I have mapped to call
        self.quit(), and my shell script relaunches the
        new version.
        """
        self.__finished = True

    def send_message(self, mess):
        """Send an XMPP message"""
        self.connect().send(mess)

    def send(self, user, text, in_reply_to=None, message_type='chat'):
        """Sends a simple message to the specified user."""
        mess = xmpp.Message(user, text)

        if in_reply_to:
            mess.setThread(in_reply_to.getThread())
            mess.setType(in_reply_to.getType())
        else:
            mess.setThread(self.__threads.get(user, None))
            mess.setType(message_type)

        self.send_message(mess)

    def send_simple_reply(self, mess, text, private=False):
        """Send a simple response to a message"""
        self.send_message( self.build_reply(mess,text, private) )

    def build_reply(self, mess, text=None, private=False):
        """Build a message for responding to another message.  Message is NOT sent"""
        if private: 
            to_user  = mess.getFrom()
            type = "chat"
        else:
            to_user  = mess.getFrom().getStripped()
            type = mess.getType()
        response = xmpp.Message(to_user, text)
        response.setThread(mess.getThread())
        response.setType(type)
        return response

    def get_sender_username(self, mess):
        """Extract the sender's user name from a message""" 
        type = mess.getType()
        jid  = mess.getFrom()
        if type == "groupchat":
            username = jid.getResource()
        elif type == "chat":
            username  = jid.getNode()
        else:
            username = ""
        return username

    def status_type_changed(self, jid, new_status_type):
        """Callback for tracking status types (available, away, offline, ...)"""
        self.debug('user %s changed status to %s' % (jid, new_status_type))

    def status_message_changed(self, jid, new_status_message):
        """Callback for tracking status messages (the free-form status text)"""
        self.debug('user %s updated text to %s' % (jid, new_status_message))

    def broadcast(self, message, only_available=False):
        """Broadcast a message to all users 'seen' by this bot.

        If the parameter 'only_available' is True, the broadcast
        will not go to users whose status is not 'Available'."""
        for jid, (show, status) in self.__seen.items():
            if not only_available or show is self.AVAILABLE:
                self.send(jid, message)

    def callback_presence(self, conn, presence):
        jid, type_, show, status = presence.getFrom(), \
                presence.getType(), presence.getShow(), \
                presence.getStatus()

        if self.jid.bareMatch(jid) and self.ignoreownmsg:
            # Ignore our own presence messages
            return

        if type_ is None:
            # Keep track of status message and type changes
            old_show, old_status = self.__seen.get(jid, (self.OFFLINE, None))
            if old_show != show:
                self.status_type_changed(jid, show)

            if old_status != status:
                self.status_message_changed(jid, status)
                
            self.__seen[jid] = (show, status)
        elif type_ == self.OFFLINE and jid in self.__seen:
            # Notify of user offline status change
            del self.__seen[jid]
            self.status_type_changed(jid, self.OFFLINE)

        try:
            subscription = self.roster.getSubscription(str(jid))
        except KeyError, ke:
            # User not on our roster
            subscription = None

        if type_ == 'error':
            self.log(presence.getError())

        self.debug('Got presence: %s (type: %s, show: %s, status: %s, subscription: %s)' % (jid, type_, show, status, subscription))

        if type_ == 'subscribe':
            # Incoming presence subscription request
            if subscription in ('to', 'both', 'from'):
                self.roster.Authorize(jid)
                self._send_status()

            if subscription not in ('to', 'both'):
                self.roster.Subscribe(jid)

            if subscription in (None, 'none'):
                self.send(jid, self.MSG_AUTHORIZE_ME)
        elif type_ == 'subscribed':
            # Authorize any pending requests for that JID
            self.roster.Authorize(jid)
        elif type_ == 'unsubscribed':
            # Authorization was not granted
            self.send(jid, self.MSG_NOT_AUTHORIZED)
            self.roster.Unauthorize(jid)

    def callback_message( self, conn, mess):
        """Messages sent to the bot will arrive here. Command handling + routing is done in this function."""

        # Prepare to handle either private chats or group chats
        type     = mess.getType()
        jid      = mess.getFrom()
        props    = mess.getProperties()
        text     = mess.getBody()
        username = self.get_sender_username(mess)

        if type not in ("groupchat", "chat"):
            self.debug("unhandled message type: %s" % type)
            return

        self.debug("*** props = %s" % props)
        self.debug("*** jid = %s" % jid)
        self.debug("*** username = %s" % username)
        self.debug("*** type = %s" % type)
        self.debug("*** text = %s" % text)

        # Ignore messages from before we joined
        if xmpp.NS_DELAY in props: return

        # Ignore messages from myself
        if username == self.__username: return

        # If a message format is not supported (eg. encrypted), txt will be None
        if not text: return

        # Ignore messages from users not seen by this bot
        if jid not in self.__seen:
            self.log('Ignoring message from unseen guest: %s' % jid)
            self.debug("I've seen: %s" % ["%s" % x for x in self.__seen.keys()])
            return

        # Remember the last-talked-in thread for replies
        self.__threads[jid] = mess.getThread()

        if ' ' in text:
            command, args = text.split(' ', 1)
        else:
            command, args = text, ''
        cmd = command.lower()
        self.debug("*** cmd = %s" % cmd)

        if self.commands.has_key(cmd):
            try:
                reply = self.commands[cmd](mess, args)
            except Exception, e:
                reply = traceback.format_exc(e)
                self.log('An error happened while processing a message ("%s") from %s: %s"' % (text, jid, reply))
                print reply
        else:
            # In private chat, it's okay for the bot to always respond.
            # In group chat, the bot should silently ignore commands it
            # doesn't understand or aren't handled by unknown_command().
            default_reply = 'Unknown command: "%s". Type "help" for available commands.<b>blubb!</b>' % cmd
            if type == "groupchat": default_reply = None
            reply = self.unknown_command( mess, cmd, args) or default_reply
        if reply:
            self.send_simple_reply(mess,reply)

    def unknown_command(self, mess, cmd, args):
        """Default handler for unknown commands

        Override this method in derived class if you
        want to trap some unrecognized commands.  If
        'cmd' is handled, you must return some non-false
        value, else some helpful text will be sent back
        to the sender.
        """
        return None

    def top_of_help_message(self):
        """Returns a string that forms the top of the help message

        Override this method in derived class if you
        want to add additional help text at the
        beginning of the help message.
        """
        return ""

    def bottom_of_help_message(self):
        """Returns a string that forms the bottom of the help message

        Override this method in derived class if you
        want to add additional help text at the end
        of the help message.
        """
        return ""

    @botcmd
    def help(self, mess, args):
        """Returns a help string listing available options.

        Automatically assigned to the "help" command."""
        if not args:
            if self.__doc__:
                description = self.__doc__.strip()
            else:
                description = 'Available commands:'

            usage = '\n'.join(sorted(['%s: %s' % (name, (command.__doc__ or '(undocumented)').split('\n', 1)[0]) for (name, command) in self.commands.items() if name != 'help' and not command._jabberbot_hidden]))
            usage = usage + '\n\nType help <command name> to get more info about that specific command.'
        else:
            description = ''
            if args in self.commands:
                usage = self.commands[args].__doc__ or 'undocumented'
            else:
                usage = 'That command is not defined.'

        top    = self.top_of_help_message()
        bottom = self.bottom_of_help_message()
        if top   : top    = "%s\n\n" % top
        if bottom: bottom = "\n\n%s" % bottom

        return '%s%s\n\n%s%s' % ( top, description, usage, bottom )

    def idle_proc( self):
        """This function will be called in the main loop."""
        pass

    def shutdown(self):
        """This function will be called when we're done serving

        Override this method in derived class if you
        want to do anything special at shutdown.
        """
        pass

    def serve_forever( self, connect_callback = None, disconnect_callback = None):
        """Connects to the server and handles messages."""
        conn = self.connect()
        if conn:
            self.log('bot connected. serving forever.')
        else:
            self.log('could not connect to server - aborting.')
            return

        if connect_callback:
            connect_callback()

        while not self.__finished:
            try:
                conn.Process(1)
                self.idle_proc()
            except KeyboardInterrupt:
                self.log('bot stopped by user request. shutting down.')
                break

        self.shutdown()

        if disconnect_callback:
            disconnect_callback()

[autosync]
path = ~/amw
pidfile = ~/.autosync.pid
syncmethod = xmpp
#syncmethod = autosync-server

# There are currently two options for handling file notifications, as neither 
# one is perfect. You can choose between the 'conservative' option, which is
# slower but should work in every corner case, and the 'optimized' option, 
# which will consume less CPU and I/O resources on a remotely-triggered pull,
# but may miss local changes until the next time autosync is restarted or a
# manual commit is done on the repository.
#
# The problem is that during a pull from the remote repository, changes will
# be applied to the local file system and consequently generate file-changed
# events. These events are in turn translated to add/remove/move commands for
# the DVCS, which would duplicate the remote changes locally in the history and
# obviously doesn't work e.g. for file removes. Therefore, the file/dir changes
# caused by a remote pull must not be translated to local DCVS changes.
# The conservative strategy solves this problem by completely suspending event
# handling while the pull is active. Because it is possible that _real_ local
# changes occur concurrently to the pull, the startup command will be run after
# the pull has been finished and event processing was resumed again. This is a
# safe option, as all local changes that occurred before or during the pull
# will be picked up by the DCVS client. However, when no local changes occurred
# (which is more probable), then this strategy causes unnecessary I/O overhead.
#
# The optimized strategy also suspends the execution of local DCVS actions 
# triggered by file/directory events during the pull, but does not completely
# discard them. Instead, all events that occurred during the pull are recorded
# in an event queue which is replayed after the pull has finished. The 
# advantage is that a complete re-scan of the local repository is avoided and
# only those files/directories that saw some modification are re-checked for 
# local changes. The disadvantage is that this depends more strongly on the
# change detection capabilities (trivial ones done by autosync-dcvs and more
# complex ones done by the respective DCVS client) and it is therefore not 
# guaranteed that all local, concurrent changes are being detected. This option
# is still being evaluated for corner cases where it doesn't work, and 
# therefore is not yet the default strategy.
pulllock = conservative
#pulllock = optimized

# The number of seconds to wait for additional events before acting. Setting 
# this lower will increase the synchronization speed at the cost of CPU and
# transfer resources.
readfrequency = 5
ignorepath = .git .svn .hg src/packages src/java/openuat 
    src/csharp/sparkleshare src/cpp/cross/keepassx src/android/ipv6config 

# Note: addcmd, rmcmd, and modifycmd take one argument, movecmd takes two 
(first the source, then the destination).
# Note: statuscmd should return with code 0 when nothing has changed in the 
# local checked-out tree that needs to be committed and non-zero when a commit
# is required.
[dcvs]
# for git
statuscmd = git status | grep -iq "nothing to commit"
addcmd = git add %s
rmcmd = git rm %s
modifycmd = git add %s
# doesn't work when the source file no longer exists, git expects to move it 
itself
#movecmd = git mv %s %s
# use this instead, git will figure out that it was a move because the file is 
similar
movecmd = git rm %s 
    git add %s
startupcmd = git add -A
commitcmd = git commit -m "Autocommit"
pushcmd = git push
pullcmd = git pull
remoteurlcmd = git config --get remote.origin.url

# for mercurial
#statuscmd = hg status
#addcmd = hg add
#rmcmd = hg remove
#modifycmd = 
#movecmd = hg mv %s %s
#startupcmd = hg addremove
#commitcmd = hg commit -m "Autocommit"
#pushcmd = hg push
#pullcmd = hg pull -u

[xmpp]
username = your XMPP id here
password = your XMPP password here
alsonotify = if set, another XMPP id that will get notified when something 
happens

[autosync-server]
server = http://whatever.sync.server
username = your-username
password = your-password

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Python script for automatic synchronization based on inotify

Reply via email to