For me, in a similar context, this would be particularly useful with SQL Server output, where if you need head headers it's not possible to lose the second line of underlining:
header1 header2 header3 ------- ------- ------- tom dick harry and possibly for other flavours of SQL too. For the huge files (20GB) I use fread for I use a perl script, for smaller ones df <- read.csv(con, header=F, skip=2, na.strings="NULL") names(df)<-do.call(rbind,(strsplit(readLines(con,1),",")))[1,] Such a pain. So as this is an SQL server 'feature' it would be really useful if fread could discard unwanted lines of header. Perhaps a regexp parameter? Regards Paul On 3 July 2013 11:00, <[email protected]>wrote: > Send datatable-help mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of datatable-help digest..." > > > Today's Topics: > > 1. Re: fread -- multiple header lines and multiple whitespace > characters (Eduard Antonyan) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 2 Jul 2013 10:29:57 -0500 > From: Eduard Antonyan <[email protected]> > To: Harish <[email protected]> > Cc: "[email protected]" > <[email protected]> > Subject: Re: [datatable-help] fread -- multiple header lines and > multiple whitespace characters > Message-ID: > <CAHZcBOpkh+05wNLYD17YQxXx+JbOL3SmkwoP+Y= > [email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > I don't know how to do this with fread, but it sounds like a good feature > request. > > If you want to do this in R (without fread), you could use readLines to > read until you get to the header, count the number of lines it took and use > 'skip' param in read.table to read the file in. I think I remember seeing > smth like that done on SO at some point, but you can always post there to > get more advice as there is generally more people who'll be able to help > you there. > > > On Sun, Jun 30, 2013 at 3:21 AM, Harish <[email protected]> wrote: > > > Hi, > > > > I am wondering whether it is possible to read a file using fread() with: > > 1) Multiple header lines, and > > 2) Multiple whitespace characters separating fields > > > > The sample of the input file is as follows: > > ------------- > > Garbage header information > > that I need to skip when reading... > > Number of lines here are variable. > > > > Serial_Number PHIv Lu/W > > (-) (lm) (lm/W) > > ABCDEFG 27.0264 103.58 > > HIJKLMNO 33.9143 91.03 > > > > Some footer information > > that spans multiple lines > > ------------- > > > > To handle the multiple lines of headers, I would have to read the file > > using fread() first, reprocess the file using a similar algorithm to > > identify the actual header -- i.e. one line above what fread() would > > identify as the header, then throw away the names of the columns fread() > > created and rename it to the actual ones I find. However, this seems to > be > > highly inefficient since I would replicate what fread() did within R -- > not > > to mention I do not quite know how to do that. > > > > As far as handling the multiple (and variable) spaces for separator, I do > > not see fread() being able to handle this either. read.table() however > > does with the default sep="" value. Of course, that does not handle the > > garbage headers and footers that fread() so beautifully avoids with its > > autostart algorithm. > > > > Any suggestions as to how I would do this easily? I have lots of these > > files to read, and doing manual editing is not desirable. If there is a > > hack I can do with fread(), that would be ideal. > > > > Thanks a lot for your help. > > > > > > Regards, > > Harish > > > > > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130702/8fb5e48d/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > End of datatable-help Digest, Vol 41, Issue 3 > ********************************************* >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
