I was wondering if there exists a function that automatically tries to detect
the format of a datafile. E.g. if it is an ascii datafile, that it can
detect appropriate defaults for the read.table() parameters. One could for
example read the first 10 lines of the file and analyze the format of the
first line in comparison with the others, count the number of dots, colons
and semicolons, etc. More generally, one could use the file extension or if
available the unix 'file' command to evaluate the filetype if it is non
ascii.

I think it should not be very complicated to get a very high accuracy for
detecting formats. For most datafiles it is for a human statistican easy to
see the format of the file by looking at a fragment, so it should be
possible to capture these rules in some code. It would be nice to have
something like a read.magic() function that reads a datafile using the
appropriate command, regardless of whether the user supplied an csv1, csv2,
tab delimited, excel, spss, stata, etc file. 

I actually started to code something like this, but then I figured that
maybe someone else has had the exact same idea.


--
View this message in context: 
http://r.789695.n4.nabble.com/Read-function-that-detects-format-automatically-tp3479958p3479958.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to