Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Kasper Daniel Hansen Wed, 09 Oct 2013 10:36:35 -0700

Perhaps instead of doing the try() it would be more specific to check
whether the first non-comment line has the same number of fields as the
last line using utils::count.fields.  This is what seems to kill us (and
leaves the scary possibility of the connection terminating right after a
\n).


Kasper





On Wed, Oct 9, 2013 at 1:27 PM, Michael Lawrence
<lawrence.mich...@gene.com>wrote:

> I recently added an attempt to detect incompleteness but obviously it is
> not very robust. So I'll give up and add the try().
>
>
> On Wed, Oct 9, 2013 at 10:16 AM, Kasper Daniel Hansen <
> kasperdanielhan...@gmail.com> wrote:
>
>> (I recently had the same problem downloading dbSnp)
>>
>> It would be an improvement if the parsing of the download data was inside
>> a try() statement, with a good error message about USCS possibly truncating
>> the record.  Also, perhaps mention truncation in the vignette (or make it
>> more visible, if it is there).
>>
>> I certainly expected to be able to download (big) tables from UCSC,
>> perhaps I was naive, but that was my expectation.
>>
>> Best,
>> Kasper
>>
>>
>> On Wed, Oct 9, 2013 at 1:09 PM, Michael Lawrence <
>> lawrence.mich...@gene.com> wrote:
>>
>>> It's not feasible to download an entire genome's worth of mappability
>>> data
>>> using rtracklayer and the underlying table browser interface. UCSC has
>>> limits in place that truncate the response. rtracklayer has little way of
>>> knowing whether the user is requesting too many records. Just download
>>> the
>>> mappability as a bigwig file via FTP and query that with rtracklayer,
>>> instead.
>>>
>>>
>>> On Wed, Oct 9, 2013 at 9:45 AM, laurent jacob <laurent.ja...@gmail.com
>>> >wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I'm trying to use the ucscTableQuery function from the rtracklayer
>>> package
>>> > to download a mapability table from the ucsc genome browser.
>>> >
>>> > Everything works fine if I restrict the query to a small range, but I
>>> get
>>> > an error message when querying the entire genome (at the moment where I
>>> > convert the UCSCTableQuery using track()):
>>> >
>>> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>> > na.strings,  :
>>> >   scan() expected 'an integer', got 'section'
>>> >
>>> > Here is a short example:
>>> >
>>> > ---------
>>> > library(rtracklayer)
>>> > mySession = browserSession('UCSC')
>>> > genome(mySession) <- 'hg19'
>>> > range <- GRanges('chr1', IRanges(start=10013, end=10021))
>>> > query.range <- ucscTableQuery(mySession, track='wgEncodeMapability',
>>> >                                   range=range,
>>> > table='wgEncodeCrgMapabilityAlign100mer')
>>> >
>>> > query.full <- ucscTableQuery(mySession, track='wgEncodeMapability',
>>> >                       range='hg19',
>>> > table='wgEncodeCrgMapabilityAlign100mer')
>>> >
>>> > ## This works
>>> > track(query.range)
>>> > ## This fails
>>> > track(query.full)
>>> > -----------
>>> >
>>> > Do you have any idea of what may cause this error?
>>> >
>>> > My sessionInfo() and traceback() of the error are given below.
>>> >
>>> > Best,
>>> >
>>> > Laurent
>>> >
>>> > --------------------------------
>>> > > sessionInfo()
>>> > R version 3.0.2 (2013-09-25)
>>> > Platform: x86_64-pc-linux-gnu (64-bit)
>>> >
>>> > locale:
>>> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> >
>>> > attached base packages:
>>> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> > [8] base
>>> >
>>> > other attached packages:
>>> > [1] rtracklayer_1.21.12   GenomicRanges_1.13.51 XVector_0.1.4
>>> > [4] IRanges_1.19.38       BiocGenerics_0.7.5
>>> >
>>> > loaded via a namespace (and not attached):
>>> > [1] Biostrings_2.29.19 bitops_1.0-6       BSgenome_1.29.1
>>> > RCurl_1.95-4.1
>>> > [5] Rsamtools_1.13.48  stats4_3.0.2       tools_3.0.2
>>> > XML_3.98-1.1
>>> > [9] zlibbioc_1.7.0
>>> > ---------------------------------
>>> >
>>> > ---------------------------------
>>> > > traceback()
>>> > 34: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
>>> >         nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
>>> >         fill = fill, strip.white = strip.white, blank.lines.skip =
>>> > blank.lines.skip,
>>> >         multi.line = FALSE, comment.char = comment.char, allowEscapes =
>>> > allowEscapes,
>>> >         flush = flush, encoding = encoding)
>>> > 33: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings
>>> =
>>> > ".",
>>> >         comment.char = "")
>>> > 32: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
>>> >         na.strings = ".", comment.char = ""))
>>> > 31: .local(con, format, text, ...)
>>> > 30: import(FileForFormat(con, format), ...)
>>> > 29: import(FileForFormat(con, format), ...)
>>> > 28: import(text = lines, format = "bedGraph", genome = genome,
>>> > asRangedData = asRangedData,
>>> >         which = which, seqinfo = seqinfo)
>>> > 27: import(text = lines, format = "bedGraph", genome = genome,
>>> > asRangedData = asRangedData,
>>> >         which = which, seqinfo = seqinfo)
>>> > 26: .local(con, format, text, ...)
>>> > 25: import(FileForFormat(con, format), ...)
>>> > 24: import(FileForFormat(con, format), ...)
>>> > 23: import(format = subformat, text = text, asRangedData =
>>> asRangedData,
>>> >         genome = genome, ...)
>>> > 22: import(format = subformat, text = text, asRangedData =
>>> asRangedData,
>>> >         genome = genome, ...)
>>> > 21: FUN(1L[[1L]], ...)
>>> > 20: lapply(seq_along(trackLines), makeTrackSet)
>>> > 19: lapply(seq_along(trackLines), makeTrackSet)
>>> > 18: .local(con, format, text, ...)
>>> > 17: import(FileForFormat(con, format), ...)
>>> > 16: import(FileForFormat(con, format), ...)
>>> > 15: import(con, "ucsc", ...)
>>> > 14: import(con, "ucsc", ...)
>>> > 13: import.ucsc(resource(con), subformat = subformat, ...)
>>> > 12: import.ucsc(resource(con), subformat = subformat, ...)
>>> > 11: .local(con, ...)
>>> > 10: import.ucsc(initialize(file, resource = con), drop = TRUE,
>>> trackLine =
>>> > FALSE,
>>> >         genome = genome, asRangedData = asRangedData, which = which,
>>> >         seqinfo = seqinfo, ...)
>>> > 9: import.ucsc(initialize(file, resource = con), drop = TRUE,
>>> trackLine =
>>> > FALSE,
>>> >        genome = genome, asRangedData = asRangedData, which = which,
>>> >        seqinfo = seqinfo, ...)
>>> > 8: .local(con, format, text, ...)
>>> > 7: import(FileForFormat(con, format), ...)
>>> > 6: import(FileForFormat(con, format), ...)
>>> > 5: import(text = output, format = format, asRangedData = asRangedData,
>>> >        seqinfo = seqinfo(range(object)))
>>> > 4: import(text = output, format = format, asRangedData = asRangedData,
>>> >        seqinfo = seqinfo(range(object)))
>>> > 3: .local(object, ...)
>>> > 2: track(query.full)
>>> > 1: track(query.full)
>>> > --------------------------------------
>>> >
>>> >
>>> > --
>>> > Laurent Jacob
>>> > Laboratoire de Biométrie et Biologie Évolutive
>>> > CNRS/Université Lyon 1
>>> > http://cbio.ensmp.fr/~ljacob
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] rtracklayer: problem with formatting the output of ucscTableQuery

Reply via email to