Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Michael Lawrence Wed, 09 Oct 2013 11:46:49 -0700

Currently it looks for the message that UCSC inserts (or used to insert?)
when truncating. count.fields is another idea.



On Wed, Oct 9, 2013 at 10:35 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> Perhaps instead of doing the try() it would be more specific to check
> whether the first non-comment line has the same number of fields as the
> last line using utils::count.fields.  This is what seems to kill us (and
> leaves the scary possibility of the connection terminating right after a
> \n).
>
> Kasper
>
>
>
>
>
> On Wed, Oct 9, 2013 at 1:27 PM, Michael Lawrence <
> lawrence.mich...@gene.com> wrote:
>
>> I recently added an attempt to detect incompleteness but obviously it is
>> not very robust. So I'll give up and add the try().
>>
>>
>> On Wed, Oct 9, 2013 at 10:16 AM, Kasper Daniel Hansen <
>> kasperdanielhan...@gmail.com> wrote:
>>
>>> (I recently had the same problem downloading dbSnp)
>>>
>>> It would be an improvement if the parsing of the download data was
>>> inside a try() statement, with a good error message about USCS possibly
>>> truncating the record.  Also, perhaps mention truncation in the vignette
>>> (or make it more visible, if it is there).
>>>
>>> I certainly expected to be able to download (big) tables from UCSC,
>>> perhaps I was naive, but that was my expectation.
>>>
>>> Best,
>>> Kasper
>>>
>>>
>>> On Wed, Oct 9, 2013 at 1:09 PM, Michael Lawrence <
>>> lawrence.mich...@gene.com> wrote:
>>>
>>>> It's not feasible to download an entire genome's worth of mappability
>>>> data
>>>> using rtracklayer and the underlying table browser interface. UCSC has
>>>> limits in place that truncate the response. rtracklayer has little way
>>>> of
>>>> knowing whether the user is requesting too many records. Just download
>>>> the
>>>> mappability as a bigwig file via FTP and query that with rtracklayer,
>>>> instead.
>>>>
>>>>
>>>> On Wed, Oct 9, 2013 at 9:45 AM, laurent jacob <laurent.ja...@gmail.com
>>>> >wrote:
>>>>
>>>> > Hi everyone,
>>>> >
>>>> > I'm trying to use the ucscTableQuery function from the rtracklayer
>>>> package
>>>> > to download a mapability table from the ucsc genome browser.
>>>> >
>>>> > Everything works fine if I restrict the query to a small range, but I
>>>> get
>>>> > an error message when querying the entire genome (at the moment where
>>>> I
>>>> > convert the UCSCTableQuery using track()):
>>>> >
>>>> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>>> > na.strings,  :
>>>> >   scan() expected 'an integer', got 'section'
>>>> >
>>>> > Here is a short example:
>>>> >
>>>> > ---------
>>>> > library(rtracklayer)
>>>> > mySession = browserSession('UCSC')
>>>> > genome(mySession) <- 'hg19'
>>>> > range <- GRanges('chr1', IRanges(start=10013, end=10021))
>>>> > query.range <- ucscTableQuery(mySession, track='wgEncodeMapability',
>>>> >                                   range=range,
>>>> > table='wgEncodeCrgMapabilityAlign100mer')
>>>> >
>>>> > query.full <- ucscTableQuery(mySession, track='wgEncodeMapability',
>>>> >                       range='hg19',
>>>> > table='wgEncodeCrgMapabilityAlign100mer')
>>>> >
>>>> > ## This works
>>>> > track(query.range)
>>>> > ## This fails
>>>> > track(query.full)
>>>> > -----------
>>>> >
>>>> > Do you have any idea of what may cause this error?
>>>> >
>>>> > My sessionInfo() and traceback() of the error are given below.
>>>> >
>>>> > Best,
>>>> >
>>>> > Laurent
>>>> >
>>>> > --------------------------------
>>>> > > sessionInfo()
>>>> > R version 3.0.2 (2013-09-25)
>>>> > Platform: x86_64-pc-linux-gnu (64-bit)
>>>> >
>>>> > locale:
>>>> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> >
>>>> > attached base packages:
>>>> > [1] parallel  stats     graphics  grDevices utils     datasets
>>>>  methods
>>>> > [8] base
>>>> >
>>>> > other attached packages:
>>>> > [1] rtracklayer_1.21.12   GenomicRanges_1.13.51 XVector_0.1.4
>>>> > [4] IRanges_1.19.38       BiocGenerics_0.7.5
>>>> >
>>>> > loaded via a namespace (and not attached):
>>>> > [1] Biostrings_2.29.19 bitops_1.0-6       BSgenome_1.29.1
>>>> > RCurl_1.95-4.1
>>>> > [5] Rsamtools_1.13.48  stats4_3.0.2       tools_3.0.2
>>>> > XML_3.98-1.1
>>>> > [9] zlibbioc_1.7.0
>>>> > ---------------------------------
>>>> >
>>>> > ---------------------------------
>>>> > > traceback()
>>>> > 34: scan(file = file, what = what, sep = sep, quote = quote, dec =
>>>> dec,
>>>> >         nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
>>>> >         fill = fill, strip.white = strip.white, blank.lines.skip =
>>>> > blank.lines.skip,
>>>> >         multi.line = FALSE, comment.char = comment.char, allowEscapes
>>>> =
>>>> > allowEscapes,
>>>> >         flush = flush, encoding = encoding)
>>>> > 33: read.table(con, colClasses = bedClasses, as.is = TRUE,
>>>> na.strings =
>>>> > ".",
>>>> >         comment.char = "")
>>>> > 32: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
>>>> >         na.strings = ".", comment.char = ""))
>>>> > 31: .local(con, format, text, ...)
>>>> > 30: import(FileForFormat(con, format), ...)
>>>> > 29: import(FileForFormat(con, format), ...)
>>>> > 28: import(text = lines, format = "bedGraph", genome = genome,
>>>> > asRangedData = asRangedData,
>>>> >         which = which, seqinfo = seqinfo)
>>>> > 27: import(text = lines, format = "bedGraph", genome = genome,
>>>> > asRangedData = asRangedData,
>>>> >         which = which, seqinfo = seqinfo)
>>>> > 26: .local(con, format, text, ...)
>>>> > 25: import(FileForFormat(con, format), ...)
>>>> > 24: import(FileForFormat(con, format), ...)
>>>> > 23: import(format = subformat, text = text, asRangedData =
>>>> asRangedData,
>>>> >         genome = genome, ...)
>>>> > 22: import(format = subformat, text = text, asRangedData =
>>>> asRangedData,
>>>> >         genome = genome, ...)
>>>> > 21: FUN(1L[[1L]], ...)
>>>> > 20: lapply(seq_along(trackLines), makeTrackSet)
>>>> > 19: lapply(seq_along(trackLines), makeTrackSet)
>>>> > 18: .local(con, format, text, ...)
>>>> > 17: import(FileForFormat(con, format), ...)
>>>> > 16: import(FileForFormat(con, format), ...)
>>>> > 15: import(con, "ucsc", ...)
>>>> > 14: import(con, "ucsc", ...)
>>>> > 13: import.ucsc(resource(con), subformat = subformat, ...)
>>>> > 12: import.ucsc(resource(con), subformat = subformat, ...)
>>>> > 11: .local(con, ...)
>>>> > 10: import.ucsc(initialize(file, resource = con), drop = TRUE,
>>>> trackLine =
>>>> > FALSE,
>>>> >         genome = genome, asRangedData = asRangedData, which = which,
>>>> >         seqinfo = seqinfo, ...)
>>>> > 9: import.ucsc(initialize(file, resource = con), drop = TRUE,
>>>> trackLine =
>>>> > FALSE,
>>>> >        genome = genome, asRangedData = asRangedData, which = which,
>>>> >        seqinfo = seqinfo, ...)
>>>> > 8: .local(con, format, text, ...)
>>>> > 7: import(FileForFormat(con, format), ...)
>>>> > 6: import(FileForFormat(con, format), ...)
>>>> > 5: import(text = output, format = format, asRangedData = asRangedData,
>>>> >        seqinfo = seqinfo(range(object)))
>>>> > 4: import(text = output, format = format, asRangedData = asRangedData,
>>>> >        seqinfo = seqinfo(range(object)))
>>>> > 3: .local(object, ...)
>>>> > 2: track(query.full)
>>>> > 1: track(query.full)
>>>> > --------------------------------------
>>>> >
>>>> >
>>>> > --
>>>> > Laurent Jacob
>>>> > Laboratoire de Biométrie et Biologie Évolutive
>>>> > CNRS/Université Lyon 1
>>>> > http://cbio.ensmp.fr/~ljacob
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Reply via email to