The sixteen lines of data you sent work in a little histogram- generator for me, ignoring the masking (as a nearly-newbie, I can say that ignoring the stuff I don't yet care about usually works):
from matplotlib.mlab import csv2rec, csv import pylab as p import numpy as n names = ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na', 'machine', 'request', 'detail') r = csv2rec("/Users/clew/Documents/pycode/test.csv", names = names) print r.shape print r[3] for name in names: print 'Values of ', name, ':' print r[name] for row in r: if row['thread'] == 537: print row print type(r['thread']) n, bins, patches = p.hist(r['thread']) print n,bins,patches p.savefig('csvhistogram') p.show() Does this work for you? On the whole file? &C On Aug 21, 2009, at 9:27 AM, Phil Robare wrote: > Hi folks, > > I have a (newbie) problem using csv2rec. I am a regular python user > but this is my first time using matplotlib and numpy after being > inspired by attending a talk by Dr. John Hunter. > > I am trying to read a csv file that has >6000 lines that look like > this: > > <code> > 8/17/2009,4:49:52 > PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: > 20090210::7881558:3893255:311247:166422::,Completed.. > 8/17/2009,4:49:52 > PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: > 20090210::7881558:3888955:311247:166422::,From > Disk.. > 8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception > in CVProcess.GetNewfile: The process cannot access the file because it > is being used by another process.., > 8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY -> > R: > 20090210 > :::3893955:311247:166422::20090210:::3893955:388247:166422::50:,. > 8/17/2009,4:29:55 > PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: > 20090728::7881558:4888461:22088980:964878::,Completed.. > 8/17/2009,4:29:55 > PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: > 20090728::7881558:4888461:22030980:964878::,From > Disk.. > 8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO > -> R: > 20090728 > :::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,. > 8/17/2009,4:24:02 > PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: > 20090226::7881558:2882501:325032:316888::,Completed.. > 8/17/2009,4:24:02 > PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: > 20090226::7881558:8822501:325882:318816::,From > Disk.. > 8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz > -> R:20090226::::325882:318816::20090226::::325882:318816::50:,. > 8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz > -> R:20090226::::325882:318816::20090226::::325032:318816::50:,. > 8/17/2009,4:19:44 > PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: > 20090210::7881558:2882613:278887:4020000::,Completed.. > 8/17/2009,4:19:43 > PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: > 20090210::7881558:2882613:278777:4020000::,From > Disk.. > 8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH > -> R: > 20090210 > :::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,. > 8/17/2009,4:11:02 > PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F: > 20090817::7881558:1776517:1211:58800::,Completed.. > 8/17/2009,4:49:52 > PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: > 20090210::7881558:3893255:311247:166422::,Completed.. > </code> > > I have given the columns names since there is not a header line: > <code> > In [150]: print names > ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na', > 'machine', 'request', 'detail') > </code> > > and I have provided convert functions to be sure the data is read > correctly: > <code> > In [152]: print converterd > {'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>, > 'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type > 'str'>, 'program': <type 'str'>, 'time': <function str2time at > 0x03795530>, 'date': <function str2date at > 0x037950B0>} > </code> > > (I'm not sure if this is needed. IPython seems to recognize csv2rec > just fine but the sample program does an import like this.) > <code> > In [141]: import matplotlib.mlab as mlab > </code> > > So now I call csv2rec on my file. It takes a second or so to gulp it > all in and then returns without error. > <code> > In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names) > </code> > > So now I look to see what I have. And it's nothing like I thought it > would be. I expected thousands of records and I have 10. I expected > times and dates, ints and strings. And all I have are masked values. > <code> > In [143]: r > Out[143]: > masked_records( > date : [-- -- -- -- -- -- -- -- -- --] > time : [-- -- -- -- -- -- -- -- -- --] > program : [-- -- -- -- -- -- -- -- -- --] > level : [-- -- -- -- -- -- -- -- -- --] > error_id : [-- -- -- -- -- -- -- -- -- --] > thread : [-- -- -- -- -- -- -- -- -- --] > na : [-- -- -- -- -- -- -- -- -- --] > machine : [-- -- -- -- -- -- -- -- -- --] > request : [-- -- -- -- -- -- -- -- -- --] > detail : [-- -- -- -- -- -- -- -- -- --] > fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?') > ) > </code> > > So I look at the mask. I see no clues here. > <code> > In [144]: r.mask > Out[144]: > array([(True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True), > (True, True, True, True, True, True, True, True, True, True)], > dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'), > ('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na', > '|b1'), ('machine', '|b1'), > ('request', '|b1'), ('detail', '|b1')]) > </code> > > Well, maybe if I change the mask I can see what is being hidden. > <code> > In [145]: r.mask[0] > Out[145]: (True, True, True, True, True, True, True, True, True, True) > > In [146]: r.mask[0]=(False,)*10 > > In [147]: r > Out[147]: > masked_records( > date : [2009-08-17 -- -- -- -- -- -- -- -- --] > time : [2009-08-17 -- -- -- -- -- -- -- -- --] > program : [2009-08-17 -- -- -- -- -- -- -- -- --] > level : [2009-08-17 -- -- -- -- -- -- -- -- --] > error_id : [2009-08-17 -- -- -- -- -- -- -- -- --] > thread : [2009-08-17 -- -- -- -- -- -- -- -- --] > na : [2009-08-17 -- -- -- -- -- -- -- -- --] > machine : [2009-08-17 -- -- -- -- -- -- -- -- --] > request : [2009-08-17 -- -- -- -- -- -- -- -- --] > detail : [2009-08-17 -- -- -- -- -- -- -- -- --] > fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?') > ) > </code> > > So I think I see what is going on. Rather than taking each line of > the input file as a record it is taking each column as a record. > Since I said there are ten values per record it stopped after ten rows > since that is all the columns it had to fill in. > > Now you know my problem. > > How do I get csv2rec to read my file so I can start getting nice > histograms of counts per day? > > A further question is why am I getting masked records at all and how > do I control this? I don't see anything in the numpy or matplotlib > user guides that answer this. I did find a helpful document on the > web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf) > that explained what masks are > and why and how they can be used. I don't need them and would like to > make sure that nothing is masked. > > Thanks in advance for helping a newbie over the hump. > > Phil Robare > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Matplotlib-users mailing list > Matplotlib-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-users Chloe Lewis Graduate student, Amundson Lab Division of Ecosystem Sciences, ESPM University of California, Berkeley 137 Mulford Hall - #3114 Berkeley, CA 94720-3114 chle...@nature.berkeley.edu ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users