Currently it looks for the message that UCSC inserts (or used to insert?) when truncating. count.fields is another idea.
On Wed, Oct 9, 2013 at 10:35 AM, Kasper Daniel Hansen < kasperdanielhan...@gmail.com> wrote: > Perhaps instead of doing the try() it would be more specific to check > whether the first non-comment line has the same number of fields as the > last line using utils::count.fields. This is what seems to kill us (and > leaves the scary possibility of the connection terminating right after a > \n). > > Kasper > > > > > > On Wed, Oct 9, 2013 at 1:27 PM, Michael Lawrence < > lawrence.mich...@gene.com> wrote: > >> I recently added an attempt to detect incompleteness but obviously it is >> not very robust. So I'll give up and add the try(). >> >> >> On Wed, Oct 9, 2013 at 10:16 AM, Kasper Daniel Hansen < >> kasperdanielhan...@gmail.com> wrote: >> >>> (I recently had the same problem downloading dbSnp) >>> >>> It would be an improvement if the parsing of the download data was >>> inside a try() statement, with a good error message about USCS possibly >>> truncating the record. Also, perhaps mention truncation in the vignette >>> (or make it more visible, if it is there). >>> >>> I certainly expected to be able to download (big) tables from UCSC, >>> perhaps I was naive, but that was my expectation. >>> >>> Best, >>> Kasper >>> >>> >>> On Wed, Oct 9, 2013 at 1:09 PM, Michael Lawrence < >>> lawrence.mich...@gene.com> wrote: >>> >>>> It's not feasible to download an entire genome's worth of mappability >>>> data >>>> using rtracklayer and the underlying table browser interface. UCSC has >>>> limits in place that truncate the response. rtracklayer has little way >>>> of >>>> knowing whether the user is requesting too many records. Just download >>>> the >>>> mappability as a bigwig file via FTP and query that with rtracklayer, >>>> instead. >>>> >>>> >>>> On Wed, Oct 9, 2013 at 9:45 AM, laurent jacob <laurent.ja...@gmail.com >>>> >wrote: >>>> >>>> > Hi everyone, >>>> > >>>> > I'm trying to use the ucscTableQuery function from the rtracklayer >>>> package >>>> > to download a mapability table from the ucsc genome browser. >>>> > >>>> > Everything works fine if I restrict the query to a small range, but I >>>> get >>>> > an error message when querying the entire genome (at the moment where >>>> I >>>> > convert the UCSCTableQuery using track()): >>>> > >>>> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >>>> > na.strings, : >>>> > scan() expected 'an integer', got 'section' >>>> > >>>> > Here is a short example: >>>> > >>>> > --------- >>>> > library(rtracklayer) >>>> > mySession = browserSession('UCSC') >>>> > genome(mySession) <- 'hg19' >>>> > range <- GRanges('chr1', IRanges(start=10013, end=10021)) >>>> > query.range <- ucscTableQuery(mySession, track='wgEncodeMapability', >>>> > range=range, >>>> > table='wgEncodeCrgMapabilityAlign100mer') >>>> > >>>> > query.full <- ucscTableQuery(mySession, track='wgEncodeMapability', >>>> > range='hg19', >>>> > table='wgEncodeCrgMapabilityAlign100mer') >>>> > >>>> > ## This works >>>> > track(query.range) >>>> > ## This fails >>>> > track(query.full) >>>> > ----------- >>>> > >>>> > Do you have any idea of what may cause this error? >>>> > >>>> > My sessionInfo() and traceback() of the error are given below. >>>> > >>>> > Best, >>>> > >>>> > Laurent >>>> > >>>> > -------------------------------- >>>> > > sessionInfo() >>>> > R version 3.0.2 (2013-09-25) >>>> > Platform: x86_64-pc-linux-gnu (64-bit) >>>> > >>>> > locale: >>>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> > [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> > >>>> > attached base packages: >>>> > [1] parallel stats graphics grDevices utils datasets >>>> methods >>>> > [8] base >>>> > >>>> > other attached packages: >>>> > [1] rtracklayer_1.21.12 GenomicRanges_1.13.51 XVector_0.1.4 >>>> > [4] IRanges_1.19.38 BiocGenerics_0.7.5 >>>> > >>>> > loaded via a namespace (and not attached): >>>> > [1] Biostrings_2.29.19 bitops_1.0-6 BSgenome_1.29.1 >>>> > RCurl_1.95-4.1 >>>> > [5] Rsamtools_1.13.48 stats4_3.0.2 tools_3.0.2 >>>> > XML_3.98-1.1 >>>> > [9] zlibbioc_1.7.0 >>>> > --------------------------------- >>>> > >>>> > --------------------------------- >>>> > > traceback() >>>> > 34: scan(file = file, what = what, sep = sep, quote = quote, dec = >>>> dec, >>>> > nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, >>>> > fill = fill, strip.white = strip.white, blank.lines.skip = >>>> > blank.lines.skip, >>>> > multi.line = FALSE, comment.char = comment.char, allowEscapes >>>> = >>>> > allowEscapes, >>>> > flush = flush, encoding = encoding) >>>> > 33: read.table(con, colClasses = bedClasses, as.is = TRUE, >>>> na.strings = >>>> > ".", >>>> > comment.char = "") >>>> > 32: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE, >>>> > na.strings = ".", comment.char = "")) >>>> > 31: .local(con, format, text, ...) >>>> > 30: import(FileForFormat(con, format), ...) >>>> > 29: import(FileForFormat(con, format), ...) >>>> > 28: import(text = lines, format = "bedGraph", genome = genome, >>>> > asRangedData = asRangedData, >>>> > which = which, seqinfo = seqinfo) >>>> > 27: import(text = lines, format = "bedGraph", genome = genome, >>>> > asRangedData = asRangedData, >>>> > which = which, seqinfo = seqinfo) >>>> > 26: .local(con, format, text, ...) >>>> > 25: import(FileForFormat(con, format), ...) >>>> > 24: import(FileForFormat(con, format), ...) >>>> > 23: import(format = subformat, text = text, asRangedData = >>>> asRangedData, >>>> > genome = genome, ...) >>>> > 22: import(format = subformat, text = text, asRangedData = >>>> asRangedData, >>>> > genome = genome, ...) >>>> > 21: FUN(1L[[1L]], ...) >>>> > 20: lapply(seq_along(trackLines), makeTrackSet) >>>> > 19: lapply(seq_along(trackLines), makeTrackSet) >>>> > 18: .local(con, format, text, ...) >>>> > 17: import(FileForFormat(con, format), ...) >>>> > 16: import(FileForFormat(con, format), ...) >>>> > 15: import(con, "ucsc", ...) >>>> > 14: import(con, "ucsc", ...) >>>> > 13: import.ucsc(resource(con), subformat = subformat, ...) >>>> > 12: import.ucsc(resource(con), subformat = subformat, ...) >>>> > 11: .local(con, ...) >>>> > 10: import.ucsc(initialize(file, resource = con), drop = TRUE, >>>> trackLine = >>>> > FALSE, >>>> > genome = genome, asRangedData = asRangedData, which = which, >>>> > seqinfo = seqinfo, ...) >>>> > 9: import.ucsc(initialize(file, resource = con), drop = TRUE, >>>> trackLine = >>>> > FALSE, >>>> > genome = genome, asRangedData = asRangedData, which = which, >>>> > seqinfo = seqinfo, ...) >>>> > 8: .local(con, format, text, ...) >>>> > 7: import(FileForFormat(con, format), ...) >>>> > 6: import(FileForFormat(con, format), ...) >>>> > 5: import(text = output, format = format, asRangedData = asRangedData, >>>> > seqinfo = seqinfo(range(object))) >>>> > 4: import(text = output, format = format, asRangedData = asRangedData, >>>> > seqinfo = seqinfo(range(object))) >>>> > 3: .local(object, ...) >>>> > 2: track(query.full) >>>> > 1: track(query.full) >>>> > -------------------------------------- >>>> > >>>> > >>>> > -- >>>> > Laurent Jacob >>>> > Laboratoire de Biométrie et Biologie Évolutive >>>> > CNRS/Université Lyon 1 >>>> > http://cbio.ensmp.fr/~ljacob >>>> > >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel