Hi folks,

I have a (newbie) problem using csv2rec.  I am a regular python user
but this is my first time using matplotlib and numpy after being
inspired by attending a talk by Dr. John Hunter.

I am trying to read a csv file that has >6000 lines that look like this:

<code>
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:3888955:311247:166422::,From
Disk..
8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception
in CVProcess.GetNewfile: The process cannot access the file because it
is being used by another process..,
8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY ->
R:20090210:::3893955:311247:166422::20090210:::3893955:388247:166422::50:,.
8/17/2009,4:29:55
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22088980:964878::,Completed..
8/17/2009,4:29:55
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22030980:964878::,From
Disk..
8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO
-> 
R:20090728:::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,.
8/17/2009,4:24:02
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090226::7881558:2882501:325032:316888::,Completed..
8/17/2009,4:24:02
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090226::7881558:8822501:325882:318816::,From
Disk..
8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325882:318816::50:,.
8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz
-> R:20090226::::325882:318816::20090226::::325032:318816::50:,.
8/17/2009,4:19:44
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278887:4020000::,Completed..
8/17/2009,4:19:43
PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278777:4020000::,From
Disk..
8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH
-> R:20090210:::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,.
8/17/2009,4:11:02
PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F:20090817::7881558:1776517:1211:58800::,Completed..
8/17/2009,4:49:52
PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed..
</code>

I have given the columns names since there is not a header line:
<code>
In [150]: print names
('date', 'time', 'program', 'level', 'error_id', 'thread', 'na',
'machine', 'request', 'detail')
</code>

and I have provided convert functions to be sure the data is read correctly:
<code>
In [152]: print converterd
{'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>,
'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type
'str'>, 'program': <type 'str'>, 'time': <function str2time at
0x03795530>, 'date': <function str2date at
0x037950B0>}
</code>

(I'm not sure if this is needed.  IPython seems to recognize csv2rec
just fine but the sample program does an import like this.)
<code>
In [141]: import matplotlib.mlab as mlab
</code>

So now I call csv2rec on my file.  It takes a second or so to gulp it
all in and then returns without error.
<code>
In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names)
</code>

So now I look to see what I have.  And it's nothing like I thought it
would be. I expected thousands of records and I have 10.  I expected
times and dates, ints and strings.  And all I have are masked values.
<code>
In [143]: r
Out[143]:
masked_records(
       date : [-- -- -- -- -- -- -- -- -- --]
       time : [-- -- -- -- -- -- -- -- -- --]
    program : [-- -- -- -- -- -- -- -- -- --]
      level : [-- -- -- -- -- -- -- -- -- --]
   error_id : [-- -- -- -- -- -- -- -- -- --]
     thread : [-- -- -- -- -- -- -- -- -- --]
         na : [-- -- -- -- -- -- -- -- -- --]
    machine : [-- -- -- -- -- -- -- -- -- --]
    request : [-- -- -- -- -- -- -- -- -- --]
     detail : [-- -- -- -- -- -- -- -- -- --]
   fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
             )
</code>

So I look at the mask.  I see no clues here.
<code>
In [144]: r.mask
Out[144]:
array([(True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True),
      (True, True, True, True, True, True, True, True, True, True)],
     dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'),
('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na',
'|b1'), ('machine', '|b1'),
('request', '|b1'), ('detail', '|b1')])
</code>

Well, maybe if I change the mask I can see what is being hidden.
<code>
In [145]: r.mask[0]
Out[145]: (True, True, True, True, True, True, True, True, True, True)

In [146]: r.mask[0]=(False,)*10

In [147]: r
Out[147]:
masked_records(
       date : [2009-08-17 -- -- -- -- -- -- -- -- --]
       time : [2009-08-17 -- -- -- -- -- -- -- -- --]
    program : [2009-08-17 -- -- -- -- -- -- -- -- --]
      level : [2009-08-17 -- -- -- -- -- -- -- -- --]
   error_id : [2009-08-17 -- -- -- -- -- -- -- -- --]
     thread : [2009-08-17 -- -- -- -- -- -- -- -- --]
         na : [2009-08-17 -- -- -- -- -- -- -- -- --]
    machine : [2009-08-17 -- -- -- -- -- -- -- -- --]
    request : [2009-08-17 -- -- -- -- -- -- -- -- --]
     detail : [2009-08-17 -- -- -- -- -- -- -- -- --]
   fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
             )
</code>

So I think I see what is going on.  Rather than taking each line of
the input file as a record it is taking each column as a record.
Since I said there are ten values per record it stopped after ten rows
since that is all the columns it had to fill in.

Now you know my problem.

How do I get csv2rec to read my file so I can start getting nice
histograms of counts per day?

A further question is why am I getting masked records at all and how
do I control this?  I don't see anything in the numpy or matplotlib
user guides that answer this.  I did find a helpful document on the
web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf)
that explained what masks are
and why and how they can be used.  I don't need them and would like to
make sure that nothing is masked.

Thanks in advance for helping a newbie over the hump.

Phil Robare

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to