Yes, exactly. On the bug list is #2660 " Improve fread na.strings handling" :

https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2660&group_id=240&atid=975

which points to :

http://stackoverflow.com/questions/15784138/bad-interpretation-of-n-a-using-fread

Matthew

On 30/09/13 15:06, Julien Barnier wrote:
Hi,

dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"), colClasses=c(a="integer"))
I think that running fread with the verbose flag allows to answer your
question :

R> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"),colClasses=c(a="integer"),
verbose=TRUE)
... <snip> ...
Column 1 ('a') has been detected as type 'character'. Ignoring request from
colClasses to read as 'integer' (a lower type) since NAs would result.
    0.000s (  0%) Memory map (rerun may be quicker)
    0.000s (  0%) sep and header detection
    0.000s (  0%) Count rows (wc -l)
    0.000s (  0%) Column type detection (first, middle and last 5 rows)
    0.000s (  0%) Allocation of 4x1 result (xMB) in RAM
    0.000s (  0%) Reading data
    0.000s (  0%) Allocation for type bumps (if any), including gc time if
triggered
    0.000s (  0%) Coercing data already read in type bumps (if any)
    0.000s (  0%) Changing na.strings to NA
    0.000s        Total

As your «a» column contains a character string "?", fread dtermines this
column as character. And colClasses is ignored as that would result in
possibly unwanted NA value. And all of this, as I understand it, is because
the replacement of na.strings by NA happens as the last step of fread, after
the column type has been set.

So it seems that the only workarounds are either to change your data to
replace your missing value code by a numerical value (like -9999 or anything
else), or to convert your column back to numeric after using fread.

Regards,

Julien


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to