I've pretty much finished up my small contract with Bernie's geokem.com, which involved two scopions in a jar: a Python scorpion versus a Java scorpion (I was the Python scorpion). Bernie had us fight it out, in a kind of double blind experiment, with himself as referee (only he could see who was winning).
I dwelt on the dictionary (as a built-in data structure) and the csv reader (native to the Standard Library csv module) as especially relevant to his work. Numeric indexing of large tables introduce extraneous X,Y coordinates where the original spreadsheet had more mnemonically meaningful axes: sample IDs (rows) vs chemical names (columns). A dictionary of dictionaries will take you to any cell on the spreadsheet -- e.g. samples['HAWAII0626']['FeO2'] -- and this is *especially* useful when the community has no agreed upon order for the columns (the rows are by definition unsorted as well). If you're going to analyze something as quicksandy as a csv file with ever-shifting column headers, better to use names, not positionality, to grab values. A numeric index approach is too risky -- you might actually get a working GIGO program, and not know the chemicals you wanted were now ordered differently. My solution guards against that sorry outcome. I wrote it all up in a 10 page PDF, plus provided working source code, a lot of it built around Python geokem had already internalized (Bernie used to write everything in Pascal). What I've found interesting about teaching Python to technology professionals is a lot depends on imparting our somewhat unfamiliar jargon. For example, the first row of the csv file is different from all the others, in that it contains headers (chemical names). Then these particular files have footers as well, separated from the data block by blank lines. So rather than using a for loop, I used a while True, with a .next() method. It's that .next() that's confusing. What does it mean? Well, the csv.csv_reader returns an iterable. So I use .next() the first time to parse headers, then loop inside a while loop until a blank line is encountered, building a dictionary as I go. Even regular open file objects have a next method (not to be used in combination with readline). I ended up explaining this by means of StringIO (which simulates file objects using strings). As to whether Python or Java won this particular bout, I think New Zealand is a little behind the times (still teaching C++ as a first language in CS). However, my hash table approach, getting away from integer indexing, may at least inform the Java-based solution, as it'd be no trouble to use the same approach in that language. Anyway, I think Bernie is sold on the value of Python. It's more just rumors (about Python being "undocumented" for example) which slow its acceptance in a knowledge domain (geochemistry) that could really use some computing savvy. Per Bernie, most geochemists use their PCs for email and word processing and that's about it. The ability to program is a lost art across the board, in many sciences as well as the humanities. Kirby _______________________________________________ Edu-sig mailing list [email protected] http://mail.python.org/mailman/listinfo/edu-sig
