I thought that I would share this with the GNHLUG. When I was subscribed to the Python tutor list a little while back, this simple gem came along and I saved it as an inspiration to depend on Python for things which I might initially jump to Perl or bash for. Someone asked earlier this morning about doing system administration tasks in Python -- yes, you can do it, you simply take a different approach.
There are three things about this script: (1) It doesn't use regexes. While I LOVE regexes and use them often, there is a performance penalty involved (depending on what you're doing), and it's not a good idea to rely on them exclusively (like using HTML::TokeParser instead of regexes for screen scraping in Perl). (2) It defines a class ("package") definition with no attributes or methods, then populates the class definition dynamically as the script is processed. This is pretty unusual, in fact I've never seen it done anywhere else. It's also a little unreadable. I'm not sure I would have done it this way myself, but I'm a clean-code lover. (3) It ties this thread back into Linux, since this is a simplification of a script that Sean 'Shaleh' Perry wrote to help manage the Debian "available" file (he is a Debian contributor). For those who were curious about Python, spend about ten-fifteen minutes figuring out how this script works (it's pretty simple, it just parses lines and adds key-value pairs to dictionaries [Python's associatively-indexed collection class, like a hash]) and you will have a better understanding of how Python is used in real life. -- Erik Begin forwarded message: From: "Sean 'Shaleh' Perry" <[EMAIL PROTECTED]> Date: Fri Jul 19, 2002 02:49:55 PM US/Eastern To: [EMAIL PROTECTED] Subject: [Tutor] little something in the way of file parsing So in Debian we have a file called the 'available' file. It lists the packages that are in a particular Debian release and looks like this: <snip> Package: telnet Priority: standard Section: net Installed-Size: 208 Maintainer: Herbert Xu <[EMAIL PROTECTED]> Architecture: i386 Source: netkit-telnet Version: 0.17-18 Replaces: netstd Provides: telnet-client Depends: libc6 (>= 2.2.4-4), libncurses5 (>= 5.2.20020112a-1) Filename: pool/main/n/netkit-telnet/telnet_0.17-18_i386.deb Size: 70736 MD5sum: 7eb82b4facdabe95a8235993abe210f6 Description: The telnet client. The telnet command is used for interactive communication with another host using the TELNET protocol. Task: unix-server Package: gnushogi Priority: optional Section: games Installed-Size: 402 Maintainer: Brian Mays <[EMAIL PROTECTED]> Architecture: i386 Version: 1.3-3 Depends: libc6 (>= 2.2.4-4), libncurses5 (>= 5.2.20020112a-1) Suggests: xshogi Filename: pool/main/g/gnushogi/gnushogi_1.3-3_i386.deb Size: 228332 MD5sum: 4a7bf0a6cce8436c6d74438a2d613152 Description: A program to play shogi, the Japanese version of chess. Gnushogi plays a game of Japanese chess (shogi) against the user or it plays against itself. Gnushogi is an modified version of the gnuchess program. It has a simple alpha-numeric board display, or it can use the xshogi program under the X Window System. </snip> One stanza after another. As preparation for a tool that would allow me to do better things than grep on it I wrote the following bit of python. I am posting this because it shows some of the powers python has for rapid coding and parsing. Thought some of the lurkers might enjoy it. Hope someone learns something from it. <code> #!/usr/bin/python import string class Package: pass availfile = '/var/lib/dpkg/available' fd = open(availfile) package_list = [] package = '' # note I use the readline() idiom because there is currently 10 thousand plus # entries in the file which equates to some 90,000 lines. while 1: line = fd.readline() if not line: break line = string.rstrip(line) if not line: package_list.append(package) continue # end of package stanza if line[0] == ' ': if not hasattr(package, 'description'): setattr(package, 'description', '') package.description += line[1:] continue # the depends line occasionally has a line like # Depends: zlib1g (>= 1:1.1.3) which would break the split() so I use the # optional maxsplit option to ask for only the first colon tag, value = string.split(line, ':', 1) value = value[1:] tag = string.lower(tag) if tag == 'package': # start a new package package = Package() # the Description format is the first line with Description: is the short # 'synopsis' the following lines are reads a paragraphs of a longer # description. paragraphs are separated with '.' to make parsing easier if tag == 'description': tag = 'short' # rename tag to allow description as long setattr(package, tag, value) priorities = {} sections = {} maintainers = {} sources = {} tasks = {} for package in package_list: priorities.setdefault(package.priority, []).append(package) sections.setdefault(package.section, []).append(package) maintainers.setdefault(package.maintainer, []).append(package) if hasattr(package, 'source'): sources.setdefault(package.source, []).append(package) if hasattr(package, 'task'): tasks.setdefault(package.task, []).append(package) print 'Summary:' print '%d packages' % len(package_list) print '%d sources' % len(sources) print '%d priorities' % len(priorities) print '%d sections' % len(sections) print '%d maintainers' % len(maintainers) print '%d tasks' % len(tasks) <code> At this point I have a list of package classes and several dictionaries holding lists of these packages. There is only one instance of the actual package in memory though, the rest are references handled by python's garbage collector. Most handy. I could now add a gui to this which would show a tree of maintainers, sections, tasks, whatever. Or I could simply walk the package list and display the synopsis. Or fun things like who maintains more packages. Which section has the least packages. What maintainer(s) is most important to Debian (he has the most packages in the most critical section). What I like about this solution is the empty Package class which gets filled as we parse. This makes it easy for the program to grow and change as the file format changes (if that is needed). All told this is about 30 minutes of work. _______________________________________________ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor -- Erik Price email: [EMAIL PROTECTED] jabber: [EMAIL PROTECTED] _______________________________________________ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss