Perhaps instead of doing the try() it would be more specific to check whether the first non-comment line has the same number of fields as the last line using utils::count.fields. This is what seems to kill us (and leaves the scary possibility of the connection terminating right after a \n).
Kasper On Wed, Oct 9, 2013 at 1:27 PM, Michael Lawrence <lawrence.mich...@gene.com>wrote: > I recently added an attempt to detect incompleteness but obviously it is > not very robust. So I'll give up and add the try(). > > > On Wed, Oct 9, 2013 at 10:16 AM, Kasper Daniel Hansen < > kasperdanielhan...@gmail.com> wrote: > >> (I recently had the same problem downloading dbSnp) >> >> It would be an improvement if the parsing of the download data was inside >> a try() statement, with a good error message about USCS possibly truncating >> the record. Also, perhaps mention truncation in the vignette (or make it >> more visible, if it is there). >> >> I certainly expected to be able to download (big) tables from UCSC, >> perhaps I was naive, but that was my expectation. >> >> Best, >> Kasper >> >> >> On Wed, Oct 9, 2013 at 1:09 PM, Michael Lawrence < >> lawrence.mich...@gene.com> wrote: >> >>> It's not feasible to download an entire genome's worth of mappability >>> data >>> using rtracklayer and the underlying table browser interface. UCSC has >>> limits in place that truncate the response. rtracklayer has little way of >>> knowing whether the user is requesting too many records. Just download >>> the >>> mappability as a bigwig file via FTP and query that with rtracklayer, >>> instead. >>> >>> >>> On Wed, Oct 9, 2013 at 9:45 AM, laurent jacob <laurent.ja...@gmail.com >>> >wrote: >>> >>> > Hi everyone, >>> > >>> > I'm trying to use the ucscTableQuery function from the rtracklayer >>> package >>> > to download a mapability table from the ucsc genome browser. >>> > >>> > Everything works fine if I restrict the query to a small range, but I >>> get >>> > an error message when querying the entire genome (at the moment where I >>> > convert the UCSCTableQuery using track()): >>> > >>> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>> > na.strings, : >>> > scan() expected 'an integer', got 'section' >>> > >>> > Here is a short example: >>> > >>> > --------- >>> > library(rtracklayer) >>> > mySession = browserSession('UCSC') >>> > genome(mySession) <- 'hg19' >>> > range <- GRanges('chr1', IRanges(start=10013, end=10021)) >>> > query.range <- ucscTableQuery(mySession, track='wgEncodeMapability', >>> > range=range, >>> > table='wgEncodeCrgMapabilityAlign100mer') >>> > >>> > query.full <- ucscTableQuery(mySession, track='wgEncodeMapability', >>> > range='hg19', >>> > table='wgEncodeCrgMapabilityAlign100mer') >>> > >>> > ## This works >>> > track(query.range) >>> > ## This fails >>> > track(query.full) >>> > ----------- >>> > >>> > Do you have any idea of what may cause this error? >>> > >>> > My sessionInfo() and traceback() of the error are given below. >>> > >>> > Best, >>> > >>> > Laurent >>> > >>> > -------------------------------- >>> > > sessionInfo() >>> > R version 3.0.2 (2013-09-25) >>> > Platform: x86_64-pc-linux-gnu (64-bit) >>> > >>> > locale: >>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> > [9] LC_ADDRESS=C LC_TELEPHONE=C >>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> > >>> > attached base packages: >>> > [1] parallel stats graphics grDevices utils datasets methods >>> > [8] base >>> > >>> > other attached packages: >>> > [1] rtracklayer_1.21.12 GenomicRanges_1.13.51 XVector_0.1.4 >>> > [4] IRanges_1.19.38 BiocGenerics_0.7.5 >>> > >>> > loaded via a namespace (and not attached): >>> > [1] Biostrings_2.29.19 bitops_1.0-6 BSgenome_1.29.1 >>> > RCurl_1.95-4.1 >>> > [5] Rsamtools_1.13.48 stats4_3.0.2 tools_3.0.2 >>> > XML_3.98-1.1 >>> > [9] zlibbioc_1.7.0 >>> > --------------------------------- >>> > >>> > --------------------------------- >>> > > traceback() >>> > 34: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, >>> > nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, >>> > fill = fill, strip.white = strip.white, blank.lines.skip = >>> > blank.lines.skip, >>> > multi.line = FALSE, comment.char = comment.char, allowEscapes = >>> > allowEscapes, >>> > flush = flush, encoding = encoding) >>> > 33: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings >>> = >>> > ".", >>> > comment.char = "") >>> > 32: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE, >>> > na.strings = ".", comment.char = "")) >>> > 31: .local(con, format, text, ...) >>> > 30: import(FileForFormat(con, format), ...) >>> > 29: import(FileForFormat(con, format), ...) >>> > 28: import(text = lines, format = "bedGraph", genome = genome, >>> > asRangedData = asRangedData, >>> > which = which, seqinfo = seqinfo) >>> > 27: import(text = lines, format = "bedGraph", genome = genome, >>> > asRangedData = asRangedData, >>> > which = which, seqinfo = seqinfo) >>> > 26: .local(con, format, text, ...) >>> > 25: import(FileForFormat(con, format), ...) >>> > 24: import(FileForFormat(con, format), ...) >>> > 23: import(format = subformat, text = text, asRangedData = >>> asRangedData, >>> > genome = genome, ...) >>> > 22: import(format = subformat, text = text, asRangedData = >>> asRangedData, >>> > genome = genome, ...) >>> > 21: FUN(1L[[1L]], ...) >>> > 20: lapply(seq_along(trackLines), makeTrackSet) >>> > 19: lapply(seq_along(trackLines), makeTrackSet) >>> > 18: .local(con, format, text, ...) >>> > 17: import(FileForFormat(con, format), ...) >>> > 16: import(FileForFormat(con, format), ...) >>> > 15: import(con, "ucsc", ...) >>> > 14: import(con, "ucsc", ...) >>> > 13: import.ucsc(resource(con), subformat = subformat, ...) >>> > 12: import.ucsc(resource(con), subformat = subformat, ...) >>> > 11: .local(con, ...) >>> > 10: import.ucsc(initialize(file, resource = con), drop = TRUE, >>> trackLine = >>> > FALSE, >>> > genome = genome, asRangedData = asRangedData, which = which, >>> > seqinfo = seqinfo, ...) >>> > 9: import.ucsc(initialize(file, resource = con), drop = TRUE, >>> trackLine = >>> > FALSE, >>> > genome = genome, asRangedData = asRangedData, which = which, >>> > seqinfo = seqinfo, ...) >>> > 8: .local(con, format, text, ...) >>> > 7: import(FileForFormat(con, format), ...) >>> > 6: import(FileForFormat(con, format), ...) >>> > 5: import(text = output, format = format, asRangedData = asRangedData, >>> > seqinfo = seqinfo(range(object))) >>> > 4: import(text = output, format = format, asRangedData = asRangedData, >>> > seqinfo = seqinfo(range(object))) >>> > 3: .local(object, ...) >>> > 2: track(query.full) >>> > 1: track(query.full) >>> > -------------------------------------- >>> > >>> > >>> > -- >>> > Laurent Jacob >>> > Laboratoire de Biométrie et Biologie Évolutive >>> > CNRS/Université Lyon 1 >>> > http://cbio.ensmp.fr/~ljacob >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel