I thought that I would share this with the GNHLUG.
When I was subscribed to the Python tutor list a little while back, this 
simple gem came along and I saved it as an inspiration to depend on 
Python for things which I might initially jump to Perl or bash for.  
Someone asked earlier this morning about doing system administration 
tasks in Python -- yes, you can do it, you simply take a different 
approach.

There are three things about this script:
(1) It doesn't use regexes.  While I LOVE regexes and use them often, 
there is a performance penalty involved (depending on what you're 
doing), and it's not a good idea to rely on them exclusively (like using 
HTML::TokeParser instead of regexes for screen scraping in Perl).
(2) It defines a class ("package") definition with no attributes or 
methods, then populates the class definition dynamically as the script 
is processed.  This is pretty unusual, in fact I've never seen it done 
anywhere else.  It's also a little unreadable.  I'm not sure I would 
have done it this way myself, but I'm a clean-code lover.
(3) It ties this thread back into Linux, since this is a simplification 
of a script that Sean 'Shaleh' Perry wrote to help manage the Debian 
"available" file (he is a Debian contributor).

For those who were curious about Python, spend about ten-fifteen minutes 
figuring out how this script works (it's pretty simple, it just parses 
lines and adds key-value pairs to dictionaries [Python's 
associatively-indexed collection class, like a hash]) and you will have 
a better understanding of how Python is used in real life.


-- Erik



Begin forwarded message:

From: "Sean 'Shaleh' Perry" <[EMAIL PROTECTED]>
Date: Fri Jul 19, 2002  02:49:55  PM US/Eastern
To: [EMAIL PROTECTED]
Subject: [Tutor] little something in the way of file parsing

So in Debian we have a file called the 'available' file.  It lists the 
packages
that are in a particular Debian release and looks like this:

<snip>
Package: telnet
Priority: standard
Section: net
Installed-Size: 208
Maintainer: Herbert Xu <[EMAIL PROTECTED]>
Architecture: i386
Source: netkit-telnet
Version: 0.17-18
Replaces: netstd
Provides: telnet-client
Depends: libc6 (>= 2.2.4-4), libncurses5 (>= 5.2.20020112a-1)
Filename: pool/main/n/netkit-telnet/telnet_0.17-18_i386.deb
Size: 70736
MD5sum: 7eb82b4facdabe95a8235993abe210f6
Description: The telnet client.
  The telnet command is used for interactive communication with another 
host
  using the TELNET protocol.
Task: unix-server

Package: gnushogi
Priority: optional
Section: games
Installed-Size: 402
Maintainer: Brian Mays <[EMAIL PROTECTED]>
Architecture: i386
Version: 1.3-3
Depends: libc6 (>= 2.2.4-4), libncurses5 (>= 5.2.20020112a-1)
Suggests: xshogi
Filename: pool/main/g/gnushogi/gnushogi_1.3-3_i386.deb
Size: 228332
MD5sum: 4a7bf0a6cce8436c6d74438a2d613152
Description: A program to play shogi, the Japanese version of chess.
  Gnushogi plays a game of Japanese chess (shogi) against the user or it
  plays against itself.  Gnushogi is an modified version of the gnuchess
  program.  It has a simple alpha-numeric board display, or it can use
  the xshogi program under the X Window System.
</snip>

One stanza after another.  As preparation for a tool that would allow me 
to do
better things than grep on it I wrote the following bit of python.

I am posting this because it shows some of the powers python has for 
rapid
coding and parsing.  Thought some of the lurkers might enjoy it.  Hope 
someone
learns something from it.

<code>

#!/usr/bin/python

import string

class Package:
     pass

availfile = '/var/lib/dpkg/available'

fd = open(availfile)

package_list = []
package = ''

# note I use the readline() idiom because there is currently 10 thousand 
plus
# entries in the file which equates to some 90,000 lines.

while 1:
     line = fd.readline()
     if not line: break

     line = string.rstrip(line)
     if not line:
         package_list.append(package)
         continue                     # end of package stanza

     if line[0] == ' ':
         if not hasattr(package, 'description'):
             setattr(package, 'description', '')
         package.description += line[1:]
         continue

     # the depends line occasionally has a line like
     # Depends: zlib1g (>= 1:1.1.3) which would break the split() so I 
use the
     # optional maxsplit option to ask for only the first colon
     tag, value = string.split(line, ':', 1)
     value = value[1:]

     tag = string.lower(tag)
     if tag == 'package':             # start a new package
         package = Package()

     # the Description format is the first line with Description: is the 
short
     # 'synopsis' the following lines are reads a paragraphs of a longer
     # description.  paragraphs are separated with '.' to make parsing 
easier
     if tag == 'description':
         tag = 'short'                # rename tag to allow description 
as long

     setattr(package, tag, value)

priorities = {}
sections = {}
maintainers = {}
sources = {}
tasks = {}

for package in package_list:
     priorities.setdefault(package.priority, []).append(package)
     sections.setdefault(package.section, []).append(package)
     maintainers.setdefault(package.maintainer, []).append(package)
     if hasattr(package, 'source'):
         sources.setdefault(package.source, []).append(package)
     if hasattr(package, 'task'):
         tasks.setdefault(package.task, []).append(package)

print 'Summary:'
print '%d packages' % len(package_list)
print '%d sources' % len(sources)
print '%d priorities' % len(priorities)
print '%d sections' % len(sections)
print '%d maintainers' % len(maintainers)
print '%d tasks' % len(tasks)
<code>

At this point I have a list of package classes and several dictionaries 
holding
lists of these packages.  There is only one instance of the actual 
package in
memory though, the rest are references handled by python's garbage 
collector.
Most handy.

I could now add a gui to this which would show a tree of maintainers, 
sections,
tasks, whatever.  Or I could simply walk the package list and display the
synopsis.

Or fun things like who maintains more packages.  Which section has the 
least
packages.  What maintainer(s) is most important to Debian (he has the 
most
packages in the most critical section).

What I like about this solution is the empty Package class which gets 
filled as
we parse.  This makes it easy for the program to grow and change as the 
file
format changes (if that is needed).

All told this is about 30 minutes of work.


_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor







--
Erik Price

email: [EMAIL PROTECTED]
jabber: [EMAIL PROTECTED]

_______________________________________________
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Reply via email to