Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Michael Lawrence Wed, 09 Oct 2013 10:10:49 -0700

It's not feasible to download an entire genome's worth of mappability data
using rtracklayer and the underlying table browser interface. UCSC has
limits in place that truncate the response. rtracklayer has little way of
knowing whether the user is requesting too many records. Just download the
mappability as a bigwig file via FTP and query that with rtracklayer,
instead.



On Wed, Oct 9, 2013 at 9:45 AM, laurent jacob <laurent.ja...@gmail.com>wrote:

> Hi everyone,
>
> I'm trying to use the ucscTableQuery function from the rtracklayer package
> to download a mapability table from the ucsc genome browser.
>
> Everything works fine if I restrict the query to a small range, but I get
> an error message when querying the entire genome (at the moment where I
> convert the UCSCTableQuery using track()):
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings,  :
>   scan() expected 'an integer', got 'section'
>
> Here is a short example:
>
> ---------
> library(rtracklayer)
> mySession = browserSession('UCSC')
> genome(mySession) <- 'hg19'
> range <- GRanges('chr1', IRanges(start=10013, end=10021))
> query.range <- ucscTableQuery(mySession, track='wgEncodeMapability',
>                                   range=range,
> table='wgEncodeCrgMapabilityAlign100mer')
>
> query.full <- ucscTableQuery(mySession, track='wgEncodeMapability',
>                       range='hg19',
> table='wgEncodeCrgMapabilityAlign100mer')
>
> ## This works
> track(query.range)
> ## This fails
> track(query.full)
> -----------
>
> Do you have any idea of what may cause this error?
>
> My sessionInfo() and traceback() of the error are given below.
>
> Best,
>
> Laurent
>
> --------------------------------
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] rtracklayer_1.21.12   GenomicRanges_1.13.51 XVector_0.1.4
> [4] IRanges_1.19.38       BiocGenerics_0.7.5
>
> loaded via a namespace (and not attached):
> [1] Biostrings_2.29.19 bitops_1.0-6       BSgenome_1.29.1
> RCurl_1.95-4.1
> [5] Rsamtools_1.13.48  stats4_3.0.2       tools_3.0.2
> XML_3.98-1.1
> [9] zlibbioc_1.7.0
> ---------------------------------
>
> ---------------------------------
> > traceback()
> 34: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
>         nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
>         fill = fill, strip.white = strip.white, blank.lines.skip =
> blank.lines.skip,
>         multi.line = FALSE, comment.char = comment.char, allowEscapes =
> allowEscapes,
>         flush = flush, encoding = encoding)
> 33: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings =
> ".",
>         comment.char = "")
> 32: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
>         na.strings = ".", comment.char = ""))
> 31: .local(con, format, text, ...)
> 30: import(FileForFormat(con, format), ...)
> 29: import(FileForFormat(con, format), ...)
> 28: import(text = lines, format = "bedGraph", genome = genome,
> asRangedData = asRangedData,
>         which = which, seqinfo = seqinfo)
> 27: import(text = lines, format = "bedGraph", genome = genome,
> asRangedData = asRangedData,
>         which = which, seqinfo = seqinfo)
> 26: .local(con, format, text, ...)
> 25: import(FileForFormat(con, format), ...)
> 24: import(FileForFormat(con, format), ...)
> 23: import(format = subformat, text = text, asRangedData = asRangedData,
>         genome = genome, ...)
> 22: import(format = subformat, text = text, asRangedData = asRangedData,
>         genome = genome, ...)
> 21: FUN(1L[[1L]], ...)
> 20: lapply(seq_along(trackLines), makeTrackSet)
> 19: lapply(seq_along(trackLines), makeTrackSet)
> 18: .local(con, format, text, ...)
> 17: import(FileForFormat(con, format), ...)
> 16: import(FileForFormat(con, format), ...)
> 15: import(con, "ucsc", ...)
> 14: import(con, "ucsc", ...)
> 13: import.ucsc(resource(con), subformat = subformat, ...)
> 12: import.ucsc(resource(con), subformat = subformat, ...)
> 11: .local(con, ...)
> 10: import.ucsc(initialize(file, resource = con), drop = TRUE, trackLine =
> FALSE,
>         genome = genome, asRangedData = asRangedData, which = which,
>         seqinfo = seqinfo, ...)
> 9: import.ucsc(initialize(file, resource = con), drop = TRUE, trackLine =
> FALSE,
>        genome = genome, asRangedData = asRangedData, which = which,
>        seqinfo = seqinfo, ...)
> 8: .local(con, format, text, ...)
> 7: import(FileForFormat(con, format), ...)
> 6: import(FileForFormat(con, format), ...)
> 5: import(text = output, format = format, asRangedData = asRangedData,
>        seqinfo = seqinfo(range(object)))
> 4: import(text = output, format = format, asRangedData = asRangedData,
>        seqinfo = seqinfo(range(object)))
> 3: .local(object, ...)
> 2: track(query.full)
> 1: track(query.full)
> --------------------------------------
>
>
> --
> Laurent Jacob
> Laboratoire de Biométrie et Biologie Évolutive
> CNRS/Université Lyon 1
> http://cbio.ensmp.fr/~ljacob
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Reply via email to