Re: [datatable-help] fread: skip

Matthew Dowle Sun, 12 May 2013 03:29:41 -0700

On 12.05.2013 00:47, Gabor Grothendieck wrote:

Not with the csv I tried. The header is messed up (most of theheader
fields are missing) and it misconstrues it as data.


That was fixed a while ago in v1.8.9, from NEWS :

" [fread] If some column names are blank they are now given defaultnames

   rather than causing the header row to be read as a data row "

The automation is great but some way to force its behavior when you
know what it should do seems essential since heuristics can't be
expected to work in all cases.

I suspect the heuristics in v1.8.9 work on all your examples so far,but ok point taken.

fread allows control of 'autostart' already. This is a line number(default 30) within the regular data block used to detect the separatorand search upwards from to find the first data row and/or column names.

Will add 'skip' then. It'll be like setting autostart=skip+1 butturning off the search upwards part. Line skip+1 will be used to detectthe separator when sep="auto" and used as column names according toheader="auto"|TRUE|FALSE as usual. It'll be an error to specify bothautostart and skip in the same call. If that sounds ok?


Matthew

On Sat, May 11, 2013 at 6:35 PM, Matthew Dowle
<[email protected]> wrote:
Hi,
Does the auto skip feature of fread cover both of those? From?fread :
" Once the separator is found on line autostart, the number ofcolumns isdetermined. Then the file is searched backwards from autostart untila rowis found that doesn't have that number of columns, or the start offile isreached. Thus, the first data row is found and any human readablebannersare automatically skipped. This feature can be particularly usefulforloading a set of files which may not all have consistently sizedbanners. "
There were also some issue with header=FALSE in the first release(1.8.8)
which have since been fixed in 1.8.9.

Matthew



On 11.05.2013 23:16, Gabor Grothendieck wrote:
I would find it useful if fread had a skip= argument as inread.table
since I have files from time to time that have garbage at the top.
Another situation I find from time to time is that the header is
messed up but one can still read the file if one can skip over the
header and specify header = FALSE.
An extra feature that would be nice but less important would be ifone
could specify skip = "string" and have it skip all lines until it
found one with "string": in it and then start reading from thematchedrow onward. Normally the string would be chosen to be a stringfound
in the header and not likely found prior to the header. read.xls in
gdata has a similar feature  and I find it quite handy at times.

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
datatable-help mailing list
[email protected]



https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] fread: skip

Reply via email to