Thanks for clearing that up. On Mon, Feb 20, 2012 at 1:58 PM, Skipper Seabold <jsseab...@gmail.com>wrote:
> On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.ol...@gmail.com> > wrote: > > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugad...@gwmail.gwu.edu> > wrote: > >> Hey everyone, > >> > >> I have timeseries data in which the column label is simply a filename > from > >> which the original data was taken. Here's some sample data: > >> > >> name1.txt name2.txt name3.txt > >> 32 34 953 > >> 32 03 402 > >> > >> I've noticed that the standard genfromtxt() method works great; > however, the > >> names aren't written correctly. That is, if I use the command: > >> > >> print data['name1.txt'] > >> > >> Nothing happens. > >> > >> However, when I remove the file extension, Eg: > >> > >> name1 name2 name3 > >> 32 34 953 > >> 32 03 402 > >> > >> Then print data['name1'] return (32, 32) as expected. It seems that the > >> period in the name isn't compatible with the genfromtxt() names > attribute. > >> Is there a workaround, or do I need to restructure my program to get the > >> extension removed? I'd rather not do this if possible for reasons that > >> aren't important for the discussion at hand. > > > > It looks like the period is just getting stripped out of the names: > > > > In [1]: import numpy as N > > > > In [2]: N.genfromtxt('sample.txt', names=True) > > Out[2]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', > '<f8')]) > > > > Interestingly, this still happens if you supply the names manually: > > > > In [17]: def reader(filename): > > ....: infile = open(filename, 'r') > > ....: names = infile.readline().split() > > ....: data = N.genfromtxt(infile, names=names) > > ....: infile.close() > > ....: return data > > ....: > > > > In [20]: data = reader('sample.txt') > > > > In [21]: data > > Out[21]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', > '<f8')]) > > > > What you can do is reset the names after genfromtxt is through with it, > though: > > > > In [34]: def reader(filename): > > ....: infile = open(filename, 'r') > > ....: names = infile.readline().split() > > ....: infile.close() > > ....: data = N.genfromtxt(filename, names=True) > > ....: data.dtype.names = names > > ....: return data > > ....: > > > > In [35]: data = reader('sample.txt') > > > > In [36]: data > > Out[36]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', > '<f8')]) > > > > Be warned, I don't know why the period is getting stripped; there may > > be a good reason, and adding it in might cause problems. > > I think the period is stripped because recarrays also offer attribute > access of names. So you wouldn't be able to do > > your_array.sample.txt > > All the names get passed through a name validator. IIRC it's something like > > from numpy.lib import _iotools > > validator = _iotools.NameValidator() > > validator.validate('sample1.txt') > validator.validate('a name with spaces') > > NameValidator has a good docstring and the gist of this should be in > the genfromtxt docs, if it's not already. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion