Of course, I'll be happy to help! By the way the verbose output was actually from computer 1 (with 1.8.9) so it seems like the -nan% problem is maybe still there?
Cheers Timothée On Thu, Mar 28, 2013 at 3:19 PM, Matthew Dowle <[email protected]>wrote: > ** > > > > Hi, > > Thanks. That was from v1.8.8 on computer 2 I hope. Computer 1 with > v1.8.9 should have the -nan% problem fixed. > > I'm a bit stumped for the moment. I've filed a bug report. Probably, if > I still can't reproduce my end, I'll add some more detailed tracing to > verbose output and ask you to try again next week if that's ok. > > Thanks for reporting! > > Matthew > > > > On 28.03.2013 14:58, Timothée Carayol wrote: > > Input contains a \n (or is ""), taking this to be text input (not a > filename) > Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. > > Using line 30 to detect sep (the last non blank line in the first 30) ... > '\t' > Found 2 columns > > First row with 2 fields occurs on line 1 (either column names or first row > of data) > All the fields on line 1 are character fields. Treating as the column > names. > Count of eol after first data row: 1023 > > Subtracted 1 for last eol and any trailing empty lines, leaving 1022 data > rows > Type codes: 33 (first 5 rows) > > Type codes: 33 (+middle 5 rows) > > Type codes: 33 (+last 5 rows) > > 0.000s (-nan%) Memory map (rerun may be quicker) > > 0.000s (-nan%) sep and header detection > > 0.000s (-nan%) Count rows (wc -l) > > 0.000s (-nan%) Column type detection (first, middle and last 5 rows) > > 0.000s (-nan%) Allocation of 1022x2 result (xMB) in RAM > > 0.000s (-nan%) Reading data > > 0.000s (-nan%) Allocation for type bumps (if any), including gc time if > triggered > 0.000s (-nan%) Coercing data already read in type bumps (if any) > > 0.000s (-nan%) Changing na.strings to NA > > 0.000s Total > > 4092 1022 > > Input contains a \n (or is ""), taking this to be text input (not a > filename) > Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. > > Using line 30 to detect sep (the last non blank line in the first 30) ... > '\t' > Found 2 columns > > First row with 2 fields occurs on line 1 (either column names or first row > of data) > All the fields on line 1 are character fields. Treating as the column > names. > Count of eol after first data row: 1023 > > Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data > rows > Type codes: 33 (first 5 rows) > > Type codes: 33 (+middle 5 rows) > > Type codes: 33 (+last 5 rows) > > 0.000s (-nan%) Memory map (rerun may be quicker) > > 0.000s (-nan%) sep and header detection > > 0.000s (-nan%) Count rows (wc -l) > > 0.000s (-nan%) Column type detection (first, middle and last 5 rows) > > 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM > > 0.000s (-nan%) Reading data > > 0.000s (-nan%) Allocation for type bumps (if any), including gc time if > triggered > 0.000s (-nan%) Coercing data already read in type bumps (if any) > > 0.000s (-nan%) Changing na.strings to NA > > 0.000s Total > > 4096 1023 > > Input contains a \n (or is ""), taking this to be text input (not a > filename) > Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. > > Using line 30 to detect sep (the last non blank line in the first 30) ... > '\t' > Found 2 columns > > First row with 2 fields occurs on line 1 (either column names or first row > of data) > All the fields on line 1 are character fields. Treating as the column > names. > Count of eol after first data row: 1023 > > Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data > rows > Type codes: 33 (first 5 rows) > > Type codes: 33 (+middle 5 rows) > > Type codes: 33 (+last 5 rows) > > 0.000s (-nan%) Memory map (rerun may be quicker) > > 0.000s (-nan%) sep and header detection > > 0.000s (-nan%) Count rows (wc -l) > > 0.000s (-nan%) Column type detection (first, middle and last 5 rows) > > 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM > > 0.000s (-nan%) Reading data > > 0.000s (-nan%) Allocation for type bumps (if any), including gc time if > triggered > 0.000s (-nan%) Coercing data already read in type bumps (if any) > > 0.000s (-nan%) Changing na.strings to NA > > 0.000s Total > > 4100 1023 > > Input contains a \n (or is ""), taking this to be text input (not a > filename) > Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. > > Using line 30 to detect sep (the last non blank line in the first 30) ... > '\t' > Found 2 columns > > First row with 2 fields occurs on line 1 (either column names or first row > of data) > All the fields on line 1 are character fields. Treating as the column > names. > Count of eol after first data row: 1023 > > Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data > rows > Type codes: 33 (first 5 rows) > > Type codes: 33 (+middle 5 rows) > > Type codes: 33 (+last 5 rows) > > 0.000s (-nan%) Memory map (rerun may be quicker) > > 0.000s (-nan%) sep and header detection > > 0.000s (-nan%) Count rows (wc -l) > > 0.000s (-nan%) Column type detection (first, middle and last 5 rows) > > 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM > > 0.000s (-nan%) Reading data > > 0.000s (-nan%) Allocation for type bumps (if any), including gc time if > triggered > 0.000s (-nan%) Coercing data already read in type bumps (if any) > > 0.000s (-nan%) Changing na.strings to NA > > 0.000s Total > > 40000 1023 > > > > On Thu, Mar 28, 2013 at 2:55 PM, Matthew Dowle <[email protected]>wrote: > >> >> >> Hm this is odd. >> >> Could you run the following and paste back the (verbose) results please. >> for (n in c(1023:1025, 10000)) { >> >> input = paste( rep('a\tb\n', n), collapse='') >> A = fread(input,verbose=TRUE) >> cat(nchar(input), nrow(A), "\n") >> } >> >> >> >> >> >> On 28.03.2013 14:38, Timothée Carayol wrote: >> >> Curiouser and curiouser.. >> >> I can reproduce on two computers with different versions of R and of >> data.table. >> >> >> >> Computer 1 (it says unknown-linux but is actually ubuntu): >> >> R version 2.15.3 (2013-03-01) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> LC_MONETARY=en_GB.UTF-8 >> LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C >> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 >> LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> >> other attached packages: >> >> [1] bit64_0.9-2 bit_1.1-10 data.table_1.8.9 colorout_1.0-0 >> >> Computer 2: >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] data.table_1.8.8 >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.2 >> >> >> On Thu, Mar 28, 2013 at 2:31 PM, Matthew Dowle <[email protected]>wrote: >> >>> >>> >>> Interesting, what's your sessionInfo() please? >>> >>> For me it seems to work ok : >>> >>> [1] 1022 >>> [1] 1023 >>> [1] 1024 >>> [1] 9999 >>> >>> > sessionInfo() >>> R version 2.15.2 (2012-10-26) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> >>> >>> On 27.03.2013 22:49, Timothée Carayol wrote: >>> >>> Agree with Muhammad, longer character strings are definitely permitted >>> in R. >>> A minimal example that show something strange happening with fread: >>> for (n in c(1023:1025, 10000)) { >>> A >>> paste( >>> rep('a\tb\n', n), >>> collapse='' >>> ), >>> sep='\t' >>> ) >>> print(nrow(A)) >>> } >>> On my computer, I obtain: >>> [1] 1022 >>> [1] 1023 >>> [1] 1023 >>> [1] 1023 >>> Hope this helps >>> Timothée >>> >>> >>> On Wed, Mar 27, 2013 at 9:23 PM, Matthew Dowle >>> <[email protected]>wrote: >>> >>>> Hi, >>>> Nice to hear from you. Nope not known to me. Obviously 4096 is 4k, is >>>> that >>>> the R limit for a character string length? What happens at 4097? >>>> Matthew >>>> >>>> > Hi, >>>> > >>>> > I have an example of a string of 4097 characters which can't be >>>> parsed by >>>> > fread; however, if I remove any character, it can be parsed just >>>> fine. Is >>>> > that a known limitation? >>>> > >>>> > (If I write the string to a file and then fread the file name, it >>>> works >>>> > too.) >>>> > >>>> > Let me know if you need the string and/or a bug report. >>>> > >>>> > Thanks >>>> > Timothée >>>> > _______________________________________________ >>>> > datatable-help mailing list >>>> > [email protected] >>>> > >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>>> >>>> >>>> >>> >>> >> >> >> > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
