The sixteen lines of data you sent work in a little histogram- 
generator for me, ignoring the masking (as a nearly-newbie, I can say  
that ignoring the stuff I don't yet care about usually works):

from matplotlib.mlab import csv2rec, csv
import pylab as p
import numpy as n
names = ('date', 'time', 'program', 'level', 'error_id', 'thread',  
'na', 'machine', 'request', 'detail')
r = csv2rec("/Users/clew/Documents/pycode/test.csv", names = names)
print r.shape
print r[3]
for name in names:
     print 'Values of ', name, ':'
     print r[name]

for row in r:
     if row['thread'] == 537: print row

print type(r['thread'])

n, bins, patches = p.hist(r['thread'])
print n,bins,patches
p.savefig('csvhistogram')
p.show()


Does this work for you? On the whole file?

&C

On Aug 21, 2009, at 9:27 AM, Phil Robare wrote:

> Hi folks,
>
> I have a (newbie) problem using csv2rec.  I am a regular python user
> but this is my first time using matplotlib and numpy after being
> inspired by attending a talk by Dr. John Hunter.
>
> I am trying to read a csv file that has >6000 lines that look like  
> this:
>
> <code>
> 8/17/2009,4:49:52
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: 
> 20090210::7881558:3893255:311247:166422::,Completed..
> 8/17/2009,4:49:52
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: 
> 20090210::7881558:3888955:311247:166422::,From
> Disk..
> 8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception
> in CVProcess.GetNewfile: The process cannot access the file because it
> is being used by another process..,
> 8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY ->
> R: 
> 20090210 
> :::3893955:311247:166422::20090210:::3893955:388247:166422::50:,.
> 8/17/2009,4:29:55
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: 
> 20090728::7881558:4888461:22088980:964878::,Completed..
> 8/17/2009,4:29:55
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: 
> 20090728::7881558:4888461:22030980:964878::,From
> Disk..
> 8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO
> -> R: 
> 20090728 
> :::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,.
> 8/17/2009,4:24:02
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: 
> 20090226::7881558:2882501:325032:316888::,Completed..
> 8/17/2009,4:24:02
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: 
> 20090226::7881558:8822501:325882:318816::,From
> Disk..
> 8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz
> -> R:20090226::::325882:318816::20090226::::325882:318816::50:,.
> 8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz
> -> R:20090226::::325882:318816::20090226::::325032:318816::50:,.
> 8/17/2009,4:19:44
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: 
> 20090210::7881558:2882613:278887:4020000::,Completed..
> 8/17/2009,4:19:43
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F: 
> 20090210::7881558:2882613:278777:4020000::,From
> Disk..
> 8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH
> -> R: 
> 20090210 
> :::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,.
> 8/17/2009,4:11:02
> PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F: 
> 20090817::7881558:1776517:1211:58800::,Completed..
> 8/17/2009,4:49:52
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F: 
> 20090210::7881558:3893255:311247:166422::,Completed..
> </code>
>
> I have given the columns names since there is not a header line:
> <code>
> In [150]: print names
> ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na',
> 'machine', 'request', 'detail')
> </code>
>
> and I have provided convert functions to be sure the data is read  
> correctly:
> <code>
> In [152]: print converterd
> {'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>,
> 'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type
> 'str'>, 'program': <type 'str'>, 'time': <function str2time at
> 0x03795530>, 'date': <function str2date at
> 0x037950B0>}
> </code>
>
> (I'm not sure if this is needed.  IPython seems to recognize csv2rec
> just fine but the sample program does an import like this.)
> <code>
> In [141]: import matplotlib.mlab as mlab
> </code>
>
> So now I call csv2rec on my file.  It takes a second or so to gulp it
> all in and then returns without error.
> <code>
> In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names)
> </code>
>
> So now I look to see what I have.  And it's nothing like I thought it
> would be. I expected thousands of records and I have 10.  I expected
> times and dates, ints and strings.  And all I have are masked values.
> <code>
> In [143]: r
> Out[143]:
> masked_records(
>        date : [-- -- -- -- -- -- -- -- -- --]
>        time : [-- -- -- -- -- -- -- -- -- --]
>     program : [-- -- -- -- -- -- -- -- -- --]
>       level : [-- -- -- -- -- -- -- -- -- --]
>    error_id : [-- -- -- -- -- -- -- -- -- --]
>      thread : [-- -- -- -- -- -- -- -- -- --]
>          na : [-- -- -- -- -- -- -- -- -- --]
>     machine : [-- -- -- -- -- -- -- -- -- --]
>     request : [-- -- -- -- -- -- -- -- -- --]
>      detail : [-- -- -- -- -- -- -- -- -- --]
>    fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
>              )
> </code>
>
> So I look at the mask.  I see no clues here.
> <code>
> In [144]: r.mask
> Out[144]:
> array([(True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True),
>       (True, True, True, True, True, True, True, True, True, True)],
>      dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'),
> ('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na',
> '|b1'), ('machine', '|b1'),
> ('request', '|b1'), ('detail', '|b1')])
> </code>
>
> Well, maybe if I change the mask I can see what is being hidden.
> <code>
> In [145]: r.mask[0]
> Out[145]: (True, True, True, True, True, True, True, True, True, True)
>
> In [146]: r.mask[0]=(False,)*10
>
> In [147]: r
> Out[147]:
> masked_records(
>        date : [2009-08-17 -- -- -- -- -- -- -- -- --]
>        time : [2009-08-17 -- -- -- -- -- -- -- -- --]
>     program : [2009-08-17 -- -- -- -- -- -- -- -- --]
>       level : [2009-08-17 -- -- -- -- -- -- -- -- --]
>    error_id : [2009-08-17 -- -- -- -- -- -- -- -- --]
>      thread : [2009-08-17 -- -- -- -- -- -- -- -- --]
>          na : [2009-08-17 -- -- -- -- -- -- -- -- --]
>     machine : [2009-08-17 -- -- -- -- -- -- -- -- --]
>     request : [2009-08-17 -- -- -- -- -- -- -- -- --]
>      detail : [2009-08-17 -- -- -- -- -- -- -- -- --]
>    fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
>              )
> </code>
>
> So I think I see what is going on.  Rather than taking each line of
> the input file as a record it is taking each column as a record.
> Since I said there are ten values per record it stopped after ten rows
> since that is all the columns it had to fill in.
>
> Now you know my problem.
>
> How do I get csv2rec to read my file so I can start getting nice
> histograms of counts per day?
>
> A further question is why am I getting masked records at all and how
> do I control this?  I don't see anything in the numpy or matplotlib
> user guides that answer this.  I did find a helpful document on the
> web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf)
> that explained what masks are
> and why and how they can be used.  I don't need them and would like to
> make sure that nothing is masked.
>
> Thanks in advance for helping a newbie over the hump.
>
> Phil Robare
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Chloe Lewis
Graduate student, Amundson Lab
Division of Ecosystem Sciences, ESPM
University of California, Berkeley
137 Mulford Hall - #3114
Berkeley, CA  94720-3114
chle...@nature.berkeley.edu


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to