Hi – I thought I'd follow up on this.
Matthew, are you still unable to reproduce it? It is still happening to me after an upgrade to R 3.0.0. And Garrett's case above seems even more severe, with a truncation at 256 characters it seems, so it's not just me, and it does seem to depend on some sort of system configuration. On Thu, Mar 28, 2013 at 3:26 PM, Timothée Carayol < [email protected]> wrote: > Of course, I'll be happy to help! > > By the way the verbose output was actually from computer 1 (with 1.8.9) so > it seems like the -nan% problem is maybe still there? > > Cheers > Timothée > > > On Thu, Mar 28, 2013 at 3:19 PM, Matthew Dowle <[email protected]>wrote: > >> ** >> >> >> >> Hi, >> >> Thanks. That was from v1.8.8 on computer 2 I hope. Computer 1 with >> v1.8.9 should have the -nan% problem fixed. >> >> I'm a bit stumped for the moment. I've filed a bug report. Probably, if >> I still can't reproduce my end, I'll add some more detailed tracing to >> verbose output and ask you to try again next week if that's ok. >> >> Thanks for reporting! >> >> Matthew >> >> >> >> On 28.03.2013 14:58, Timothée Carayol wrote: >> >> Input contains a \n (or is ""), taking this to be text input (not a >> filename) >> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. >> >> Using line 30 to detect sep (the last non blank line in the first 30) ... >> '\t' >> Found 2 columns >> >> First row with 2 fields occurs on line 1 (either column names or first >> row of data) >> All the fields on line 1 are character fields. Treating as the column >> names. >> Count of eol after first data row: 1023 >> >> Subtracted 1 for last eol and any trailing empty lines, leaving 1022 data >> rows >> Type codes: 33 (first 5 rows) >> >> Type codes: 33 (+middle 5 rows) >> >> Type codes: 33 (+last 5 rows) >> >> 0.000s (-nan%) Memory map (rerun may be quicker) >> >> 0.000s (-nan%) sep and header detection >> >> 0.000s (-nan%) Count rows (wc -l) >> >> 0.000s (-nan%) Column type detection (first, middle and last 5 rows) >> >> 0.000s (-nan%) Allocation of 1022x2 result (xMB) in RAM >> >> 0.000s (-nan%) Reading data >> >> 0.000s (-nan%) Allocation for type bumps (if any), including gc time >> if triggered >> 0.000s (-nan%) Coercing data already read in type bumps (if any) >> >> 0.000s (-nan%) Changing na.strings to NA >> >> 0.000s Total >> >> 4092 1022 >> >> Input contains a \n (or is ""), taking this to be text input (not a >> filename) >> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. >> >> Using line 30 to detect sep (the last non blank line in the first 30) ... >> '\t' >> Found 2 columns >> >> First row with 2 fields occurs on line 1 (either column names or first >> row of data) >> All the fields on line 1 are character fields. Treating as the column >> names. >> Count of eol after first data row: 1023 >> >> Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data >> rows >> Type codes: 33 (first 5 rows) >> >> Type codes: 33 (+middle 5 rows) >> >> Type codes: 33 (+last 5 rows) >> >> 0.000s (-nan%) Memory map (rerun may be quicker) >> >> 0.000s (-nan%) sep and header detection >> >> 0.000s (-nan%) Count rows (wc -l) >> >> 0.000s (-nan%) Column type detection (first, middle and last 5 rows) >> >> 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM >> >> 0.000s (-nan%) Reading data >> >> 0.000s (-nan%) Allocation for type bumps (if any), including gc time >> if triggered >> 0.000s (-nan%) Coercing data already read in type bumps (if any) >> >> 0.000s (-nan%) Changing na.strings to NA >> >> 0.000s Total >> >> 4096 1023 >> >> Input contains a \n (or is ""), taking this to be text input (not a >> filename) >> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. >> >> Using line 30 to detect sep (the last non blank line in the first 30) ... >> '\t' >> Found 2 columns >> >> First row with 2 fields occurs on line 1 (either column names or first >> row of data) >> All the fields on line 1 are character fields. Treating as the column >> names. >> Count of eol after first data row: 1023 >> >> Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data >> rows >> Type codes: 33 (first 5 rows) >> >> Type codes: 33 (+middle 5 rows) >> >> Type codes: 33 (+last 5 rows) >> >> 0.000s (-nan%) Memory map (rerun may be quicker) >> >> 0.000s (-nan%) sep and header detection >> >> 0.000s (-nan%) Count rows (wc -l) >> >> 0.000s (-nan%) Column type detection (first, middle and last 5 rows) >> >> 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM >> >> 0.000s (-nan%) Reading data >> >> 0.000s (-nan%) Allocation for type bumps (if any), including gc time >> if triggered >> 0.000s (-nan%) Coercing data already read in type bumps (if any) >> >> 0.000s (-nan%) Changing na.strings to NA >> >> 0.000s Total >> >> 4100 1023 >> >> Input contains a \n (or is ""), taking this to be text input (not a >> filename) >> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. >> >> Using line 30 to detect sep (the last non blank line in the first 30) ... >> '\t' >> Found 2 columns >> >> First row with 2 fields occurs on line 1 (either column names or first >> row of data) >> All the fields on line 1 are character fields. Treating as the column >> names. >> Count of eol after first data row: 1023 >> >> Subtracted 0 for last eol and any trailing empty lines, leaving 1023 data >> rows >> Type codes: 33 (first 5 rows) >> >> Type codes: 33 (+middle 5 rows) >> >> Type codes: 33 (+last 5 rows) >> >> 0.000s (-nan%) Memory map (rerun may be quicker) >> >> 0.000s (-nan%) sep and header detection >> >> 0.000s (-nan%) Count rows (wc -l) >> >> 0.000s (-nan%) Column type detection (first, middle and last 5 rows) >> >> 0.000s (-nan%) Allocation of 1023x2 result (xMB) in RAM >> >> 0.000s (-nan%) Reading data >> >> 0.000s (-nan%) Allocation for type bumps (if any), including gc time >> if triggered >> 0.000s (-nan%) Coercing data already read in type bumps (if any) >> >> 0.000s (-nan%) Changing na.strings to NA >> >> 0.000s Total >> >> 40000 1023 >> >> >> >> On Thu, Mar 28, 2013 at 2:55 PM, Matthew Dowle <[email protected]>wrote: >> >>> >>> >>> Hm this is odd. >>> >>> Could you run the following and paste back the (verbose) results please. >>> for (n in c(1023:1025, 10000)) { >>> >>> input = paste( rep('a\tb\n', n), collapse='') >>> A = fread(input,verbose=TRUE) >>> cat(nchar(input), nrow(A), "\n") >>> } >>> >>> >>> >>> >>> >>> On 28.03.2013 14:38, Timothée Carayol wrote: >>> >>> Curiouser and curiouser.. >>> >>> I can reproduce on two computers with different versions of R and of >>> data.table. >>> >>> >>> >>> Computer 1 (it says unknown-linux but is actually ubuntu): >>> >>> R version 2.15.3 (2013-03-01) >>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> >>> >>> locale: >>> >>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >>> LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> LC_MONETARY=en_GB.UTF-8 >>> LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C >>> LC_ADDRESS=C >>> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 >>> LC_IDENTIFICATION=C >>> >>> >>> >>> attached base packages: >>> >>> [1] stats graphics grDevices utils datasets methods base >>> >>> >>> >>> other attached packages: >>> >>> [1] bit64_0.9-2 bit_1.1-10 data.table_1.8.9 colorout_1.0-0 >>> >>> Computer 2: >>> R version 2.15.2 (2012-10-26) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] data.table_1.8.8 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.15.2 >>> >>> >>> On Thu, Mar 28, 2013 at 2:31 PM, Matthew Dowle >>> <[email protected]>wrote: >>> >>>> >>>> >>>> Interesting, what's your sessionInfo() please? >>>> >>>> For me it seems to work ok : >>>> >>>> [1] 1022 >>>> [1] 1023 >>>> [1] 1024 >>>> [1] 9999 >>>> >>>> > sessionInfo() >>>> R version 2.15.2 (2012-10-26) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> >>>> >>>> On 27.03.2013 22:49, Timothée Carayol wrote: >>>> >>>> Agree with Muhammad, longer character strings are definitely >>>> permitted in R. >>>> A minimal example that show something strange happening with fread: >>>> for (n in c(1023:1025, 10000)) { >>>> A >>>> paste( >>>> rep('a\tb\n', n), >>>> collapse='' >>>> ), >>>> sep='\t' >>>> ) >>>> print(nrow(A)) >>>> } >>>> On my computer, I obtain: >>>> [1] 1022 >>>> [1] 1023 >>>> [1] 1023 >>>> [1] 1023 >>>> Hope this helps >>>> Timothée >>>> >>>> >>>> On Wed, Mar 27, 2013 at 9:23 PM, Matthew Dowle >>>> <[email protected]>wrote: >>>> >>>>> Hi, >>>>> Nice to hear from you. Nope not known to me. Obviously 4096 is 4k, is >>>>> that >>>>> the R limit for a character string length? What happens at 4097? >>>>> Matthew >>>>> >>>>> > Hi, >>>>> > >>>>> > I have an example of a string of 4097 characters which can't be >>>>> parsed by >>>>> > fread; however, if I remove any character, it can be parsed just >>>>> fine. Is >>>>> > that a known limitation? >>>>> > >>>>> > (If I write the string to a file and then fread the file name, it >>>>> works >>>>> > too.) >>>>> > >>>>> > Let me know if you need the string and/or a bug report. >>>>> > >>>>> > Thanks >>>>> > Timothée >>>>> > _______________________________________________ >>>>> > datatable-help mailing list >>>>> > [email protected] >>>>> > >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
