Hi Martin,
I'd recommend first to try with the current development version to see if this
has already been fixed… Matt's already fixed some fread bugs that were
recurring.
You can get it from here: https://github.com/Rdatatable/data.table Please
scroll down to see the installation instructions.
And if you still get the error, could you please file a bug report
https://github.com/Rdatatable/data.table/issues with a *reproducible example*
please? If necessary, you can also link to a *minimal* file that can reproduce
the issue; it'd be much helpful.
Thanks,
Arun
From: Martin Watts <[email protected]>
Reply: Martin Watts <[email protected]>>
Date: September 4, 2014 at 3:09:13 PM
To: [email protected]
<[email protected]>>
Subject: [datatable-help] Unexpected Result Reading in Data File using fread
All
I am trying to read in a data file using fread()
I am getting several warnings indicating that a non-numeric entry was found in
a numeric field and as a result the column is being converted to a character
vector, however the non-numeric entry is one of the declared na.strings and
indeed the specific entry is returned as NA.
I expected that the "?" entry would been recognised as NA and column to be read
as numeric vector. I have tried the same action with read.table() and it works
as I was expecting.
I am using:
R version 3.1.1 (pre-compiled)
RStudio Version 0.98.983
data.table package v1.92
locale is: en_GB.UTF-8
on:
OS-X Version 10.9.4
the code I am using is:
"library("data.table")
column.class <- c(rep("character",2), rep("numeric",7))
data2 <- fread("./data/household_power_consumption.txt",
sep=";",
na.strings=c("?",""),
colClasses=column.class,
header=TRUE,
nrows=7000,
verbose=TRUE
)"
the 1st line in the data file causing the problem + the one before are:
21/12/2006;11:22:00;0.244;0.000;242.290;1.000;0.000;0.000;0.000
21/12/2006;11:23:00;?;?;?;?;?;?;
The 1st warning is:
1: In fread("./data/household_power_consumption.txt", na.strings = "?") :
Bumped column 3 to type character on data row 6840, field contains '?'.
Coercing previously read values in this column from integer or numeric back to
character which may not be lossless; e.g., if '00' and '000' occurred before
they will now be just '0', and there may be inconsistencies with treatment of
',,' and ',NA,' too (if they occurred in this column before the bump). If this
matters please rerun and set 'colClasses' to 'character' for this column.
Please note that column type detection uses the first 5 rows, the middle 5 rows
and the last 5 rows, so hopefully this message should be very rare. If
reporting to datatable-help, please rerun and include the output from
verbose=TRUE.
Martin
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help