Dear all, Some very wise data entry person gave me about an hour of a headache, trying to find out why a 2000x500 dataframe won't be read into R. After much trial and error, I pinpointed the problem to an accidentally inserted double quote into a string variable (some comments from an open question). This can be replicated by:
aa <- data.frame(id=1:2, var1=c("some \" quote", "without quote")) > aa id var1 1 1 some " quote 2 2 without quote Saving this with R: write.table(aa, "aa.dat", sep="\t", row.names=F) creates the following ASCII file (between #s) ### R export "id" "var1" 1 "some \" quote" 2 "without quote" ### which throws an error when trying to load it back: > bb <- read.table("aa.dat", sep="\t", header=T) Warning message: In read.table("aa.dat", sep = "\t", header = T) : incomplete final line found by readTableHeader on 'aa.dat' The dataframe was initially an SPSS file, which saved it as tab delimited in this format: ### SPSS export "id" "var1" 1 "some " quote" 2 "without quote" ### which of course thrown the same obvious error. StatTransfer was the only software that solved the problem of exporting the SPSS file in a tab delimited file that could finally be imported in R, and the saved file looks like this: ### StatTransfer export "id" "var1" 1 "some "" quote" 2 "without quote" ### Given these examples, I have two questions: 1. What is the correct syntax to import the R-exported file 2. What can I do to prevent these situations from happening? (besides whipping the data entry person :), I am referring to R procedures to detect and correct such things) Thank you, Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.