Hi Kasper --
For the proxy, I think the idea is httr::config(use_proxy("https://my.proxy"))
Martin
From: Kasper Daniel Hansen [mailto:[email protected]]
Sent: Tuesday, September 15, 2015 10:35 AM
To: Morgan, Martin
Cc: [email protected]
Subject: Re: [Bioc-devel] AnnotationHub: cleanup
On Tue, Sep 15, 2015 at 12:25 AM, Morgan, Martin
<[email protected]> wrote:
Hi Kasper -- we'll try to act on these, but some comments / looking for
clarification...
> -----Original Message-----
> From: Bioc-devel [mailto:[email protected]] On Behalf Of
> Kasper Daniel Hansen
> Sent: Monday, September 14, 2015 10:45 PM
> To: [email protected]
> Subject: [Bioc-devel] AnnotationHub: cleanup
>
> I currently have the `pleasure` of dealing with students who have problems
> with installing AnnotationHub and/or downloading resources. Here are some
> comments including some possible bug reports.
I hope this is on the whole a positive experience, and we'll do what we can to
make it better.
Well, I love the package and I love it even more having prepared material on
it. And the people who complain is of course enriched for people who have
problems - no way to know if it just works for most people.
And of course right now it is more troublesome since I prepared the class using
R-3.2.1 and then 3.2.2 was released just before we started and had the http ->
https change which is an obvious suspect when people have download problems :)
> 1) I think it is extremely dangerous that `cache(ahub)` starts by asking to
> download all resources! May I suggest this only happens with a specific
> setting like `cache(ahub, download=TRUE)` or something similar.
>
> 2) `cache(ahub)` deletes all cached information, except the sqlite database.
> Could we get a way to remove everything?
>
> 3) While I can understand the difference between cache and hubCache, I
> would suggest that hubCache(ahub) = NULL removes all cached material
> included the sqlite database.
For each of the above the envisioned use case was that 'hub' is a subset, eg.,
subhub = query(hub, c("homo", "ensembl", "81"))
and the user wanted to manipulate all records in the sub-hub. cache(subhub)
asks about the 'really download" if the size of the (sub)hub is greater than
hubOption("MAX_DOWNLOADS"), which by default is 10; it seems like asking is the
same as requiring an argument? fileName(subhub) may be closer to what you're
looking for...? the path to the file name, or NA if It is not in the cache.
For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND the
sqlite file for the entire hub.
The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and
removed with file.remove(dbfile(subhub))). In some ways it wasn't envisioned
that this manual manipulation would be a common use case (!).
Ok. Let me perhaps rephrase my wish list
1) some easy way to reset the entire cache issue, with emphasis on easy. This
is most likely to be used by beginners. Who it's done, I don't care to much
about. And I suggest a heading in ?AnnotationHub called something like
"Flushing the cache" or something.
2) It seems natural that there is a way (for problem reporting) to report which
resources are cached, which is (again) easy and does not involve download. I
don't care if it is cache() or some other name.
> 4) It seems that AnnotationHub in the release version of Bioconductor
> defaults to using https://. Wasn't full support for https:// introduced in R
> 3.2.2; if so, it seems to be a critical bug that it is using https://
AnnotationHub uses httr::GET and ultimately curl::curl_fetch_disk rather than
native R support, so what R does is not directly relevant. From ?curl
Drop-in replacement for base 'url' that supports https, ftps,
gzip, deflate, etc. Default behavior is identical to 'url', but
request can be fully configured by passing a custom 'handle'.
So I wonder what the actual problem is?
Interesting. Well, at least one user is behind a proxy and uses the tips in
?download.file to set a proxy server. Perhaps that doesn't work with httr? I
don't know. But there are more than one person with problems
> 5) Perhaps it should be considered that the default hubCache path is
> versioned, perhaps with Bioc version, perhaps with something else. This
> might cause problems for people running multiple versions of R.
The data base is supposed to handle versioning, so if you've populated the
cache with Bioc 3.2 and are now accessing the cache with Bioc 3.1, only the 3.1
resources are visible. The hope was to avoid multiple copies of these possibly
large resources.
That sounds pretty nifty.. I was thinking re-design of the database issues.
> 6) I strongly suggest that the output printed when retrieving an
> AnnotationHub resource includes the download url.
Ok something that's easy to do! Sometimes this will be cryptic (when the
resource is cached in the AnnotationHub server, rather than being retrieved
from the original source)
Perhaps it should just say "loading from cache"
> 7) If you run AnnotationHub without having GenomicRanges / rtracklayer
> installed, it downloads the resource and then pangs out with an error. To me
> it seems more natural to pang out with an error immediately, especially since
> when it works, it appears from message printing that loading the library
> happens prior to download.
I guess by 'run AnnotationHub' you mean retrieve a specific resource?
The import recipes generally start by require()ing the necessary libraries. I
spotted a couple of recipes that didn't follow this convention (for 2bit and
chain file resources from rtracklayer; none that involved GenomicRanges). Are
there specific examples?
As a test case I got a Windows virtual machine up and running, total clean,
and just did biocLite("AnnotationHub"). Then I picked two random resources and
tried to download them; one was a UCSC chain file and I don't know the other
one. In both cases I totally got a decent error message, which I can fully
understand. But looking at it with beginner eyes, I just thought it was weird
that the error on missing a library happened after download. It's not a bit
deal, but if you don't know what you're doing you might get confused.
Best,
Kasper
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have received
this message in error, please notify the sender immediately by e-mail and
delete this email message from your computer. Thank you.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel