Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Paulo Jabardo Mon, 27 Feb 2012 10:00:36 -0800

I don't know what is the best solution but this certainly isn't madness. 

First of all '.' isn't international notation it is used in some countries. In 
most of Europe (and Latin America) the comma is used. Anyone in countries that 
use a comma as a separator will stumble upon text files with comma as decimal 
separators very often. Usually a simple search and replace is sufficient but if 
if the data has string fields, one might mess up the data.

Is this the most important feature? Of course not but it helps a lot. As a 
matter of fact, one of the reasons I started to use R years ago was the 
flexibility of the function read.table: I don't have to worry about tabular 
data in text text files, I know I can read them (most of the time...). Now, I 
use rpy to call read.table.

As for speed, right now read.table is faster than loadtxt. Of course numpy 
shouldn't simply reproduce any feature found in R (or matlab, scilab, etc) but 
reading data from external sources is a very important step in any data 
analysis (and often a difficult step). So while this feature is not a top 
priority it is important for anyone that has to deal with external data written 
by other programs that use the "correct" locale and it is certainly not in the 
path to madness.

I have been thinking for a while about writing/porting a read.table equivalent 
but unfortunately I haven't had much time in the past few months and because of 
that I have kind of stopped my transition from R to python for a while.

Paulo

________________________________
 De: Alan G Isaac <alan.is...@gmail.com>
Para: Discussion of Numerical Python <numpy-discussion@scipy.org> 
Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53
Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text 
file readers

On 2/27/2012 10:10 AM,
 Paulo Jabardo wrote:
> I have a few features that I believe would make text file easier for many 
> people. In some countries (most?) the decimal separator in real numbers is 
> not a point but a comma.
> I think it would be very useful that the decimal separator be specified with 
> a keyword argument (decimal = '.' for example) on the text reading function.

Down that path lies madness.

For a fast reader, just document input format to use
"international notation" (i.e., the decimal point)
and give the user the responsibility to ensure the
data are in the right format.

The format translation utilities should be separate,
and calling them should be optional.

fwiw,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Reply via email to