*Hi Derek, Thanks very much for your quick reply. I make a short summary of what I've tried. Actually the *['S10'] + [ float for n in range(48) ] *only* *works when you explicitly specify the columns to be read, and genfromtxt cannot automatically determine the type* *if you don't specify the type....
I also have a problem with the missing value which I described at the end of this mail. Sorry for the very long example.... Thanks again, * In [164]: b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10'] + [ float for n in range(48)]) In [165]: b Out[165]: array([ ('01/01/2003', -999.0, -1.028, -999.0, -999.0, -999.0, -999.0, -999.0, - 25.368400000000001, 0.75920799999999999, -25.425699999999999, 0.7763219999999999 6, -25.220500000000001, 0.77561899999999995, 0.20000000000000001, 280.089, 0.574 58299999999995, 0.417018, -0.042441800000000002, 0.0428254, -0.18517600000000001 , -0.056775800000000001, 93.721299999999999, -8.1318099999999998, -9.5244, -9.93 23200000000007, -10.2728, -20.945499999999999, -8.4939999999999998, -9.567819999 9999993, -9.9175500000000003, -9.7835400000000003, -10.4445, -999.0, -999.0, -99 9.0, -999.0, -999.0, -2.80863, -6.7711100000000002, -999.0, -999.0, -999.0, 0.10 9, 0.075999999999999998, 0.10000000000000001, 0.074999999999999997, 0.0, -999.0), ('01/01/2003', -999.0, -0.40899999999999997, -999.0, -999.0, -999.0, -999 .0, -999.0, -25.3233, 0.75929800000000003, -25.368600000000001, 0.77451599999999 998, -25.118400000000001, 0.77264200000000005, 0.20499999999999999, 267.80599999 999998, 0.59291700000000003, 0.42051699999999997, -0.037141399999999998, 0.04043 3200000000002, -0.16375999999999999, -0.029456400000000001, 93.749099999999999, -8.1292799999999996, -9.5213800000000006, -9.9336199999999995, -10.2749000000000 01, -21.1402, -8.4918899999999997, -9.5663699999999992, -9.9207000000000001, -9. 7896099999999997, -10.4514, -999.0, -999.0, -999.0, -999.0, -999.0, -2.8468, -6. 7986899999999997, -999.0, -999.0, -999.0, 0.109, 0.075999999999999998, 0.1000000 0000000001, 0.074999999999999997, 0.0, -999.0), .... dtype=[('TIMESTAMP', '|S10'), ('CO2_flux', '<f8'), ('Net_radiation', '<f8' ), ('Sensible_heat_flux', '<f8'), ('Latent_heat_flux', '<f8'), ('u', '<f8'), ('W ater_vapor_density_by_LiCor_7500', '<f8'), ('CO2_concentration', '<f8'), ('Air_t emperature_High', '<f8'), ('HMP45C', '<f8'), ('Relative_humidity_High', '<f8'), ('HMP45C_1', '<f8'), ('Air_temperature_Middle', '<f8'), ('HMP45C_2', '<f8'), ('R elative_humidity_Middle', '<f8'), ('HMP45C_3', '<f8'), ('Air_temperature_Low', ' <f8'), ('HMP45C_4', '<f8'), ('Relative_humidity_Low', '<f8'), ('HMP45C_5', '<f8' ), ('Wind_speed_High', '<f8'), ('Wind_direction_High', '<f8'), ('Wind_speed_Low' , '<f8'), ('PAR_High', '<f8'), ('PAR_Low', '<f8'), ('Incoming_shortwave_radiatio n_LI200X', '<f8'), ('Incoming_shortwave_radiation_Eppley', '<f8'), ('Outgoing_sh ortwave_radiation_Eppley', '<f8'), ('Pressure', '<f8'), ('Soil_temp_1_20_cm', '< f8'), ('Soil_temp_1_10_cm', '<f8'), ('Soil_temp_1_5_cm', '<f8'), ('Soil_temp_1_2 5_cm', '<f8'), ('Soil_temp_1_0_cm', '<f8'), ('Soil_temp_2_20_cm', '<f8'), ('Soil _temp_2_10_cm', '<f8'), ('Soil_temp_2_5_cm', '<f8'), ('Soil_temp_2_25_cm', '<f8' ), ('Soil_temp_2_0_cm', '<f8'), ('Soil_temp_3_20cm', '<f8'), ('Soil_temp_3_10_cm ', '<f8'), ('Soil_temp_3_5_cm', '<f8'), ('Soil_temp_3_25_cm', '<f8'), ('Soil_tem p_3_0_cm', '<f8'), ('Soil_heat_flux_1', '<f8'), ('Soil_heat_flux_2', '<f8'), ('S oil_heat_flux_3', '<f8'), ('soil_water_T1', '<f8'), ('soil_water_T2', '<f8')]) *But if I use the following, it gives error:* In [171]: b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S 10'] + [ float for n in range(48)]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in <module>() C:\Python26\lib\site-packages\numpy\lib\npyio.pyc in genfromtxt(fname, dtype, co mments, delimiter, skiprows, skip_header, skip_footer, converters, missing, miss ing_values, filling_values, usecols, names, excludelist, deletechars, replace_sp ace, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_rais e) 1449 # Raise an exception ? 1450 if invalid_raise: -> 1451 raise ValueError(errmsg) 1452 # Issue a warning ? 1453 else: ValueError * If I don't specify the dtype, it will not recognize the type of the first column (it displays as nan):* In [172]: b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2)) In [173]: b Out[173]: array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997), (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0), (nan, -999.0, -999.0), (nan, -999.0, -999.0)], dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'), ('Net_radiation', '<f8') ]) *Then the final question is, actually the '-999.0' in the data is missing value, but I cannot display it as 'nan' by specifying the missing_values as '-999.0': but either I set the missing_values as -999.0 or using a dictionary, it neither work... *In [178]: b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2),dtype="|S18,float,float",missing_values=-999.0) In [179]: b Out[179]: array([('01/01/2003 00:00', -999.0, -1.028), ('01/01/2003 00:30', -999.0, -0.40899999999999997), ('01/01/2003 01:00', -999.0, 0.16700000000000001), ..., ('31/12/2003 22:30', -999.0, -999.0), ('31/12/2003 23:00', -999.0, -999.0), ('31/12/2003 23:30', -999.0, -999.0)], dtype=[('TIMESTAMP', '|S18'), ('CO2_flux', '<f8'), ('Net_radiation', '<f8' )]) In [180]: b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=( 0,1,2),dtype="|S18,float,float",missing_values={1:'-999.0'}) In [181]: In [182]: b Out[182]: array([('01/01/2003 00:00', -999.0, -1.028), ('01/01/2003 00:30', -999.0, -0.40899999999999997), ('01/01/2003 01:00', -999.0, 0.16700000000000001), ..., ('31/12/2003 22:30', -999.0, -999.0), ('31/12/2003 23:00', -999.0, -999.0), ('31/12/2003 23:30', -999.0, -999.0)], dtype=[('TIMESTAMP', '|S18'), ('CO2_flux', '<f8'), ('Net_radiation', '<f8' )])* the value of is actually -999.0 *In [183]: b['CO2_flux'][1]==-999.0 Out[183]: True *Even this doesn't work (suppose 2 is our missing_value),* In [184]: data = "1, 2, 3\n4, 5, 6" In [185]: np.genfromtxt(StringIO(data), delimiter=",",dtype="int,int,int",missin g_values=2) Out[185]: array([(1, 2, 3), (4, 5, 6)], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')]) In [186]: np.genfromtxt(StringIO(data), delimiter=",",dtype="int,int,int",names= "a,b,c",missing_values={'b':2},filling_values=nan) Out[186]: array([(1, 2, 3), (4, 5, 6)], dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])* Can you give me some suggestion? Thanks in advance~~* * Chao * 2011/6/26 Derek Homeier <de...@astro.physik.uni-goettingen.de> > On 26.06.2011, at 8:48PM, Chao YUE wrote: > > > I want to read a csv file with many (49) columns, the first column is > string and remaning can be float. > > how can I avoid type in like > > > > data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10, > float, float, ......)) > > > > Can I just specify the type of first cloumn is tring and the remaing > float? how can I do that? > > Simply use 'dtype=None' to let genfromtxt automatically determine the type > (it is perhaps a bit confusing that this is not the default - maybe it > should be repeated in the docstring for clarity that the default is for > dtype is 'float'...). > Also, a shorter way of typing the dtype above (e.g. in case some columns > would be auto-detected as int) would be > ['S10'] + [ float for n in range(48) ] > > HTH, > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 ************************************************************************************
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion