I don't know what is the best solution but this certainly isn't madness.
First of all '.' isn't international notation it is used in some countries. In
most of Europe (and Latin America) the comma is used. Anyone in countries that
use a comma as a separator will stumble upon text files with comma as decimal
separators very often. Usually a simple search and replace is sufficient but if
if the data has string fields, one might mess up the data.
Is this the most important feature? Of course not but it helps a lot. As a
matter of fact, one of the reasons I started to use R years ago was the
flexibility of the function read.table: I don't have to worry about tabular
data in text text files, I know I can read them (most of the time...). Now, I
use rpy to call read.table.
As for speed, right now read.table is faster than loadtxt. Of course numpy
shouldn't simply reproduce any feature found in R (or matlab, scilab, etc) but
reading data from external sources is a very important step in any data
analysis (and often a difficult step). So while this feature is not a top
priority it is important for anyone that has to deal with external data written
by other programs that use the "correct" locale and it is certainly not in the
path to madness.
I have been thinking for a while about writing/porting a read.table equivalent
but unfortunately I haven't had much time in the past few months and because of
that I have kind of stopped my transition from R to python for a while.
Paulo
________________________________
De: Alan G Isaac <alan.is...@gmail.com>
Para: Discussion of Numerical Python <numpy-discussion@scipy.org>
Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53
Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text
file readers
On 2/27/2012 10:10 AM,
Paulo Jabardo wrote:
> I have a few features that I believe would make text file easier for many
> people. In some countries (most?) the decimal separator in real numbers is
> not a point but a comma.
> I think it would be very useful that the decimal separator be specified with
> a keyword argument (decimal = '.' for example) on the text reading function.
Down that path lies madness.
For a fast reader, just document input format to use
"international notation" (i.e., the decimal point)
and give the user the responsibility to ensure the
data are in the right format.
The format translation utilities should be separate,
and calling them should be optional.
fwiw,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion