Stavros Macrakis macrakis at alum.mit.edu writes:
read.table gives idiosyncratic results when the input is formatted
strangely, for example:
read.table(textConnection(
a'b\nc'd\n),header=FALSE,
fill=TRUE,sep=,quote=')
= c'd a'b c'd
read.table(textConnection(
a'b\nc'd\nf'\n'\n),
header=FALSE,fill=TRUE
sep=,quote=')
= f' \na b c'd f' \n
Though read.table doesn't specify the syntax of its input precisely, these
results don't seem particularly useful or consistent.
Is there a stricter version of read.table (perhaps in a package) that gives
errors or warnings if it finds quotation marks in the middle of fields or
encounters other such peculiar situations?
I dissected this behavior a bit more here
https://stat.ethz.ch/pipermail/r-devel/2010-November/059016.html
(it is due to an inconsistency between the way that scan() and
readLines() handle lines with unterminated quotes, IIRC)
and Martin Maechler said
https://stat.ethz.ch/pipermail/r-devel/2010-November/059107.html
I think it can be defended to file as a bug, but it is tricky to pinpoint
exactly what the issue is.
I don't know of a stricter version of read.table(), but if you had
the time and inclination to pick through the code and (i) provide a
careful definition of desired behavior and (ii) supply patches, you could
do your little bit to make R better. (If I posted a bug report would you
annotate it with a discussion of desired behavior?)
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.