The comments are really a banner at the start of the file it seems. So this is all built in to fread already. But the banner in the example is 34 rows, so the default of autostart=30 isn't enough. Try:

    fread("03217500.exsa.rsb", autostart=40)

That should do it in one shot, including detecting the column names. I've just increased autostart a bit to be within the data block. See ?fread for a detailed description of autostart and the procedure.

Btw, if there is more than one table in a single file, then setting autostart to be within each one is how to read each one in. And provided there is no footer, you can set autostart to be very large, too (with downside of time to seek back from the end to find the column names).

Matthew

On 05/08/13 20:52, jim holtman wrote:
Here is what I would do. Read in the file, delete the comments, write it back out and then process it.


> myFile <- tempfile()  # temp file
> input <- readLines('/temp/dv.txt') # this is a copy of the data you posted
> # remove comments
> input <- input[!grepl("^#", input)]
> require(data.table)
Loading required package: data.table
data.table 1.8.8  For help type: help("data.table")
> writeLines(input, myFile)
> dv <- fread(myFile)

>
> str(dv)
Classes 'data.table' and 'data.frame':  367 obs. of  21 variables:
 $ agency_cd        : chr  "5s" "USGS" "USGS" "USGS" ...
 $ site_no          : chr  "15s" "02169570" "02169570" "02169570" ...
$ datetime : chr "20d" "2012-08-04" "2012-08-05" "2012-08-06" ...
 $ 04_00095_00001   : chr  "14n" "" "" "" ...
 $ 04_00095_00001_cd: chr  "10s" "" "" "" ...
 $ 04_00095_00002   : chr  "14n" "" "" "" ...
 $ 04_00095_00002_cd: chr  "10s" "" "" "" ...
 $ 04_00095_00003   : chr  "14n" "" "" "" ...
 $ 04_00095_00003_cd: chr  "10s" "" "" "" ...
 $ 05_00065_00001   : chr  "14n" "2.10" "1.71" "1.77" ...
 $ 05_00065_00001_cd: chr  "10s" "A" "A" "A" ...
 $ 05_00065_00002   : chr  "14n" "1.71" "1.56" "1.57" ...
 $ 05_00065_00002_cd: chr  "10s" "A" "A" "A" ...
 $ 05_00065_00003   : chr  "14n" "1.89" "1.62" "1.63" ...
 $ 05_00065_00003_cd: chr  "10s" "A" "A" "A" ...
 $ 15_00060_00001   : chr  "14n" "52" "33" "36" ...
 $ 15_00060_00001_cd: chr  "10s" "A" "A" "A" ...
 $ 15_00060_00002   : chr  "14n" "33" "27" "27" ...
 $ 15_00060_00002_cd: chr  "10s" "A" "A" "A" ...
 $ 15_00060_00003   : chr  "14n" "42" "29" "30" ...
 $ 15_00060_00003_cd: chr  "10s" "A" "A" "A" ...
 - attr(*, ".internal.selfref")=<externalptr>



On Mon, Aug 5, 2013 at 3:38 PM, iembry <[email protected] <mailto:[email protected]>> wrote:

    Hi Matthew, this link is in a similar format to the files that I'm
    processing
    now:
    
http://waterdata.usgs.gov/nwis/dv?cb_00095=on&cb_00065=on&cb_00060=on&format=rdb&period=&begin_date=2012-08-04&end_date=2013-08-04&site_no=02169570&referred_module=sw

    Both file formats begin with the comments followed by the column names
    followed by agency code information and then the actual data.

    The .rdb text files vary in length (some may range from a few
    hundred lines
    long to over 20,000 lines). I am given the files that I am processing.

    Thank you.

    Irucka







    --
    View this message in context:
    
http://r.789695.n4.nabble.com/data-table-on-existing-data-frame-list-tp4673142p4673181.html
    Sent from the datatable-help mailing list archive at Nabble.com.
    _______________________________________________
    datatable-help mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to