[datatable-help] fread -- multiple header lines and multiple whitespace characters

Harish Sun, 30 Jun 2013 01:25:57 -0700

Hi,

I am wondering whether it is possible to read a file using fread() with:
1) Multiple header lines, and


2) Multiple whitespace characters separating fields


The sample of the input file is as follows:
-------------
Garbage header information
that I need to skip when reading...
Number of lines here are variable.


             Serial_Number   PHIv     Lu/W     
                    (-)      (lm)     (lm/W)
           ABCDEFG  27.0264 103.58
           HIJKLMNO  33.9143  91.03

Some footer information
that spans multiple lines

-------------

To handle the multiple lines of headers, I would have to read the file using 
fread() first, reprocess the file using a similar algorithm to identify the 
actual header -- i.e. one line above what fread() would identify as the header, 
then throw away the names of the columns fread() created and rename it to the 
actual ones I find.  However, this seems to be highly inefficient since I would 
replicate what fread() did within R -- not to mention I do not quite know how 
to do that.


As far as handling the multiple (and variable) spaces for separator, I do not 
see fread() being able to handle this either.  read.table() however does with 
the default sep="" value.  Of course, that does not handle the garbage headers 
and footers that fread() so beautifully avoids with its autostart algorithm.

Any suggestions as to how I would do this easily?  I have lots of these files 
to read, and doing manual editing is not desirable.  If there is a hack I can 
do with fread(), that would be ideal.


Thanks a lot for your help.


Regards,
Harish

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

[datatable-help] fread -- multiple header lines and multiple whitespace characters

Reply via email to