Hi folks, I have a (newbie) problem using csv2rec. I am a regular python user but this is my first time using matplotlib and numpy after being inspired by attending a talk by Dr. John Hunter.
I am trying to read a csv file that has >6000 lines that look like this: <code> 8/17/2009,4:49:52 PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed.. 8/17/2009,4:49:52 PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:3888955:311247:166422::,From Disk.. 8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception in CVProcess.GetNewfile: The process cannot access the file because it is being used by another process.., 8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY -> R:20090210:::3893955:311247:166422::20090210:::3893955:388247:166422::50:,. 8/17/2009,4:29:55 PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22088980:964878::,Completed.. 8/17/2009,4:29:55 PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090728::7881558:4888461:22030980:964878::,From Disk.. 8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO -> R:20090728:::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,. 8/17/2009,4:24:02 PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090226::7881558:2882501:325032:316888::,Completed.. 8/17/2009,4:24:02 PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090226::7881558:8822501:325882:318816::,From Disk.. 8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz -> R:20090226::::325882:318816::20090226::::325882:318816::50:,. 8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz -> R:20090226::::325882:318816::20090226::::325032:318816::50:,. 8/17/2009,4:19:44 PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278887:4020000::,Completed.. 8/17/2009,4:19:43 PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:20090210::7881558:2882613:278777:4020000::,From Disk.. 8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH -> R:20090210:::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,. 8/17/2009,4:11:02 PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F:20090817::7881558:1776517:1211:58800::,Completed.. 8/17/2009,4:49:52 PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:20090210::7881558:3893255:311247:166422::,Completed.. </code> I have given the columns names since there is not a header line: <code> In [150]: print names ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na', 'machine', 'request', 'detail') </code> and I have provided convert functions to be sure the data is read correctly: <code> In [152]: print converterd {'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>, 'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type 'str'>, 'program': <type 'str'>, 'time': <function str2time at 0x03795530>, 'date': <function str2date at 0x037950B0>} </code> (I'm not sure if this is needed. IPython seems to recognize csv2rec just fine but the sample program does an import like this.) <code> In [141]: import matplotlib.mlab as mlab </code> So now I call csv2rec on my file. It takes a second or so to gulp it all in and then returns without error. <code> In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names) </code> So now I look to see what I have. And it's nothing like I thought it would be. I expected thousands of records and I have 10. I expected times and dates, ints and strings. And all I have are masked values. <code> In [143]: r Out[143]: masked_records( date : [-- -- -- -- -- -- -- -- -- --] time : [-- -- -- -- -- -- -- -- -- --] program : [-- -- -- -- -- -- -- -- -- --] level : [-- -- -- -- -- -- -- -- -- --] error_id : [-- -- -- -- -- -- -- -- -- --] thread : [-- -- -- -- -- -- -- -- -- --] na : [-- -- -- -- -- -- -- -- -- --] machine : [-- -- -- -- -- -- -- -- -- --] request : [-- -- -- -- -- -- -- -- -- --] detail : [-- -- -- -- -- -- -- -- -- --] fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?') ) </code> So I look at the mask. I see no clues here. <code> In [144]: r.mask Out[144]: array([(True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True), (True, True, True, True, True, True, True, True, True, True)], dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'), ('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na', '|b1'), ('machine', '|b1'), ('request', '|b1'), ('detail', '|b1')]) </code> Well, maybe if I change the mask I can see what is being hidden. <code> In [145]: r.mask[0] Out[145]: (True, True, True, True, True, True, True, True, True, True) In [146]: r.mask[0]=(False,)*10 In [147]: r Out[147]: masked_records( date : [2009-08-17 -- -- -- -- -- -- -- -- --] time : [2009-08-17 -- -- -- -- -- -- -- -- --] program : [2009-08-17 -- -- -- -- -- -- -- -- --] level : [2009-08-17 -- -- -- -- -- -- -- -- --] error_id : [2009-08-17 -- -- -- -- -- -- -- -- --] thread : [2009-08-17 -- -- -- -- -- -- -- -- --] na : [2009-08-17 -- -- -- -- -- -- -- -- --] machine : [2009-08-17 -- -- -- -- -- -- -- -- --] request : [2009-08-17 -- -- -- -- -- -- -- -- --] detail : [2009-08-17 -- -- -- -- -- -- -- -- --] fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?') ) </code> So I think I see what is going on. Rather than taking each line of the input file as a record it is taking each column as a record. Since I said there are ten values per record it stopped after ten rows since that is all the columns it had to fill in. Now you know my problem. How do I get csv2rec to read my file so I can start getting nice histograms of counts per day? A further question is why am I getting masked records at all and how do I control this? I don't see anything in the numpy or matplotlib user guides that answer this. I did find a helpful document on the web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf) that explained what masks are and why and how they can be used. I don't need them and would like to make sure that nothing is masked. Thanks in advance for helping a newbie over the hump. Phil Robare ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users