On Wed, Oct 29, 2014 at 1:07 PM, Vincent Carey <[email protected]> wrote: > On Wed, Oct 29, 2014 at 2:15 PM, Hervé Pagès <[email protected]> wrote: > >> Hi, >> >> On 10/28/2014 08:51 PM, Vincent Carey wrote: >> >>> >>> >>> On Tue, Oct 28, 2014 at 5:48 PM, Hervé Pagès <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> >>> On 10/28/2014 12:42 PM, Vincent Carey wrote: >>> >>> >>> >>> On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>> Hi, >>> >>> On 10/28/2014 08:48 AM, Vincent Carey wrote: >>> >>> On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen < >>> [email protected] <mailto:[email protected] >>> > >>> <mailto:kasperdanielhansen@__gmail.com >>> >>> <mailto:[email protected]>>> wrote: >>> >>> Well, first I want to make sure that there is not >>> something >>> special >>> regarding S4 methods and classes. I have a feeling >>> that they >>> are a special >>> case. >>> >>> Second, while I agree with Jim's general opinion, >>> it is a >>> little bit >>> different when I have return objects which are >>> defined in >>> other packages. >>> If I don't depend on this other package, the user >>> is hosed >>> wrt. the return >>> object, unless I manually export all classes from >>> this other >>> >>> >>> In what sense? If you return an instance of GRanges, >>> certain >>> things can be >>> done >>> even if GenomicRanges is not attached. >>> >>> >>> Yes certain things maybe, but it's hard to predict which >>> ones. >>> >>> You can get values of slots, for >>> example. >>> >>> With the following little package >>> >>> %vjcair> cat foo/NAMESPACE >>> >>> importFrom(IRanges, IRanges) >>> >>> importClassesFrom(____GenomicRanges, GRanges) >>> >>> importFrom(GenomicRanges, GRanges) >>> >>> export(myfun) >>> >>> >>> >>> %vjcair> cat foo/DESCRIPTION >>> >>> Package: foo >>> >>> Title: foo >>> >>> Version: 0.0.0 >>> >>> Author: VJ Carey <[email protected] >>> <mailto:[email protected]> >>> <mailto:stvjc@channing.__harvard.edu >>> <mailto:[email protected]>>> >>> >>> Description: >>> >>> Suggests: >>> >>> Depends: >>> >>> Imports: GenomicRanges >>> >>> Maintainer: VJ Carey <[email protected] >>> <mailto:[email protected]> >>> <mailto:stvjc@channing.__harvard.edu >>> >>> <mailto:[email protected]>>> >>> >>> >>> License: Private >>> >>> LazyLoad: yes >>> >>> >>> >>> %vjcair> cat foo/R/* >>> >>> myfun = function(seqnames="1", ranges=IRanges(1,2), ...) >>> >>> GRanges(seqnames=seqnames, ranges=ranges, ...) >>> >>> >>> The following works: >>> >>> >>> library(foo) >>> >>> >>> x = myfun() >>> >>> >>> x >>> >>> >>> GRanges object with 1 range and 0 metadata columns: >>> >>> seqnames ranges strand >>> >>> <Rle> <IRanges> <Rle> >>> >>> [1] 1 [1, 2] * >>> >>> ------- >>> >>> seqinfo: 1 sequence from an unspecified genome; no >>> seqlengths >>> >>> >>> So the show method works, even though I have not >>> touched it. (I >>> did not >>> >>> expect it to work, in fact.) >>> >>> >>> Exactly. Let's call it luck ;-) >>> >>> Additionally, I can get access to slots. >>> >>> >>> The end user should never try to access slots directly but >>> use getters >>> and setters instead. And most getters and setters for >>> GRanges objects >>> are defined and documented in the GenomicRanges package. >>> Those that are >>> not are defined in packages that GenomicRanges depends on. >>> >>> But >>> ranges() >>> >>> fails. If I, the user, want to use it, I need to >>> arrange for that. >>> >>> >>> IMO if your package returns a GRanges object to the user, >>> then the user >>> should be able to access the man page for GRanges objects >>> with ?GRanges. >>> >>> >>> Oddly enough, that seems to be incorrect. I added a man page to >>> foo >>> that has >>> a \link[GenomicRanges]{GRanges-__class}. I ran help.start and >>> the cross >>> reference >>> from my man page succeeds. Furthermore with the sessionInfo >>> below, ?GRanges >>> succeeds at the CLI. >>> >>> >>> Did you try to run example(GRanges)? I'm not sure that will work. >>> >>> >>> Correct. Cursory look at source shows that help() uses loadedNamespaces() >>> to find the help file. example() could probably do likewise. >>> >> >> Sounds reasonable. So it seems that some recent changes in R make >> it possible to access the man page and examples for stuff that >> is imported but not attached. This is an important shift in paradigm >> to me. In the past I would just rely on the simple notion that >> what I can access with ? or example() reflects what's in my >> search pass. Now if I do ?DNAStringSet and it succeeds, I can't >> assume DNAStringSet() is in my search path anymore. And if I >> want to copy/paste a few commands from the examples in order to >> try them in my session, they might fail because the package where >> these examples belong is not necessarily attached. >> I wonder whether that means we should now start every example >> section with library(foo)? The rationale for not doing it so far >> > > I think that would be excessive. You are correct that some code will > not run, and the user will have to decide what to do. We have access to > core members. example() could be tuned to check for attachment of the > package hosting the page and fail if the host package is not attached, with > a hint as to how to proceed. For cutting and pasting, caveat emptor.
That's already taken care of; example() already attaches the package, cf. https://github.com/wch/r-source/blob/trunk/src/library/utils/R/example.R#L53-L54 EXAMPLE: $ R --vanilla R Under development (unstable) (2014-10-26 r66879) -- "Unsuffered Consequences" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) [...] > search() [1] ".GlobalEnv" "package:stats" "package:graphics" [4] "package:grDevices" "package:utils" "package:datasets" [7] "package:methods" "Autoloads" "package:base" > example("md5sum", package="tools") md5sum> as.vector(md5sum(dir(R.home(), pattern = "^COPY", full.names = TRUE))) [1] "0cce1e42ef3fb133940946534fcf8896" > search() [1] ".GlobalEnv" "package:tools" "package:stats" [4] "package:graphics" "package:grDevices" "package:utils" [7] "package:datasets" "package:methods" "Autoloads" [10] "package:base" > > >> was that if you can access the man page with ? then that means >> the package is already attached. ...but maybe it wouldn't hurt to be explicit and add a library("...") at the top, just as we do everywhere else including vignettes and package test scripts. /Henrik >> >> As a side note the decision to extend the scope of ? to attached >> packages and not to all installed packages feels arbitrary to me. >> Going all the way would make ? even more useful and would be >> consistent with what I see when navigating the documentation in >> a browser. So when the user wants to call DNAStringSet() but >> doesn't remember where it lives, ?DNAStringSet would be a quick >> and easy way to know, and this whether the package is loaded via >> a namespace or not. >> > > I think this is a reasonable objective. > > >> >> Anyway, to get back to the original topic, IMO this change in R >> still doesn't justify changing the Depends vs Imports game. I see >> at least 3 strong cases for using 'Depends: A' instead of 'Imports: A' >> in package B: >> (1) B defines (and exports) a class that extend a class defined in A. >> > > In my view there is a risk of needless namespace pollution in this case. > Depends seems extreme, other things being equal. Better to let the user > determine in real time whether this should occur. It seems to me that > particularly > when packages have lots of complicated interrelationships, it is best to > have the > developers manage symbols internally to the code, reducing as much as > possible > the impact on the user the user environment. Minimizing the use of Depends > seems > consistent with this. > > >> (2) B defines (and exports) methods for a generic defined in A. >> (3) B defines (and exports) functions or methods that return >> objects of a class defined in package A. >> >> 'Imports: A' should be reserved to situations where A is used >> internally by B and in a way that is B's internal business only >> and none of the end-user's business. A typical example is the >> internal use of RSQLite and biomaRt in GenomicFeatures. >> > > I'm sympathetic to this view but would rather be out of the business of > figuring out what the end-user's business is apart from using and > getting value from the functions defined in the package that I contributed. > > Leaving the attachments up to the user is one way. > > >> >> I can see the attractiveness of trying to minimize what gets attached >> to the user's session but I'm also concerned that trying to go to far >> in that direction ultimately has no real benefit and can hurt the >> user-friendliness of the software. >> > > We should try to assemble data on this concern. I don't know how to do it. > > >> >> H. >> >> >>> >>> For example after I do library(rtracklayer), I can indeed do >>> ?DNAStringSet at the command line (I'm surprised this works), but >>> then example(DNAStringSet) fails: >>> >>> > example(DNAStringSet) >>> Warning message: >>> In example(DNAStringSet) : no help found for ‘DNAStringSet’ >>> >>> I'm also surprised this is just a warning but that's another story... >>> >>> H. >>> >>> I am not trying to defend the NOTE but the >>> principle of minimizing >>> Depends declarations needs to be considered critically, and I am >>> just >>> exploring the space. >>> >>> > ?GRanges # it worked as usual in the tty >>> >>> > sessionInfo() >>> >>> R version 3.1.1 (2014-07-10) >>> >>> Platform: x86_64-apple-darwin13.1.0 (64-bit) >>> >>> >>> locale: >>> >>> [1] >>> en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8 >>> >>> >>> >>> attached base packages: >>> >>> [1] stats graphics grDevices datasets utils tools >>> methods >>> >>> [8] base >>> >>> >>> other attached packages: >>> >>> [1] foo_0.0.0 rmarkdown_0.3.8 knitr_1.6 >>> >>> [4] weaver_1.31.0 codetools_0.2-9 digest_0.6.4 >>> >>> [7] BiocInstaller_1.16.0 >>> >>> >>> loaded via a namespace (and not attached): >>> >>> [1] BiocGenerics_0.11.5 evaluate_0.5.5 formatR_1.0 >>> >>> [4] GenomeInfoDb_1.1.26 GenomicRanges_1.17.48 htmltools_0.2.6 >>> >>> [7] IRanges_1.99.32 parallel_3.1.1 S4Vectors_0.2.8 >>> >>> [10] stats4_3.1.1 stringr_0.6.2 XVector_0.5.8 >>> >>> And that works only if the GenomicRanges package is >>> attached. Attaching >>> GenomicRanges will also attach other packages that >>> GenomicRanges depends >>> on where some GRanges accessors might be defined and >>> documented (e.g. >>> metadata()). >>> >>> >>> >>> In some cases you'll decide you want the user to have a >>> full >>> complement of >>> >>> methods for your package to function meaningfully. For >>> example, >>> I am >>> considering >>> >>> using dplyr idioms to work with data structures in a >>> package, >>> and it seems >>> I should >>> >>> just depend on dplyr rather than pick out and document >>> which >>> things I want >>> to expose. But that >>> >>> may still be an undesirable design. >>> >>> >>> package, like >>> importClassesFrom("____GenomicRanges", >>> "GRanges") >>> >>> >>> exportClasses("GRanges") >>> Surely that is not intended. >>> >>> It is important that my package works without being >>> attached >>> to the search >>> path and I do this by carefully importing what I >>> need, ie. >>> my code does not >>> require that my dependencies are attached to the >>> search >>> path. But the end >>> user will be hosed without it. >>> >>> >>> Yes s/he will. Fortunately when your package namespace gets >>> loaded by >>> another package, then nothing gets attached to the search >>> path, even if >>> your package depends (instead of imports) on other >>> packages. So using >>> Depends instead of Imports for your own dependencies won't >>> make any >>> difference in that respect, which is good. >>> >>> >>> My impression is that the NOTE in R CMD check was >>> written by >>> someone who >>> did not anticipate large-scale use and re-use of >>> classes and >>> methods across >>> many packages. >>> >>> >>> That's my impression too. >>> >>> Cheers, >>> H. >>> >>> >>> Best, >>> Kasper >>> >>> >>> On Tue, Oct 28, 2014 at 11:14 AM, James W. MacDonald >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>> I agree with Vince. It's your job as a package >>> developer >>> to make >>> available to your package all the functions >>> necessary >>> for the package to >>> work. But I am not sure it is your job to load >>> all the >>> packages that your >>> end user might need. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> On Tue, Oct 28, 2014 at 11:04 AM, Vincent Carey < >>> [email protected] <mailto:[email protected]> >>> <mailto:stvjc@channing.__harvard.edu >>> <mailto:[email protected]>>> wrote: >>> >>> On Tue, Oct 28, 2014 at 10:19 AM, Kasper >>> Daniel Hansen < >>> [email protected] <mailto:[email protected] >>> > >>> <mailto:kasperdanielhansen@__gmail.com >>> <mailto:[email protected]>>> wrote: >>> >>> What is the current best paradigm for >>> using all >>> the classes in >>> >>> S4Vectors/GenomeInfoDb/____GenomicRanges/IRanges >>> >>> >>> >>> I obviously import methods and classes >>> from the >>> relevant packages. >>> >>> But shouldn't I depend on these packages >>> as >>> well? Since I basically >>> >>> want >>> >>> the user to have this functionality at >>> the >>> command line? That is what >>> >>> I do >>> >>> now. >>> >>> >>> I've wondered about this as well. It seems >>> the >>> principle is that the >>> user >>> should >>> take care of attaching additional packages >>> when >>> needed. It might be >>> appropriate >>> to give a hint in the package startup >>> message, if >>> having some other >>> package >>> attached >>> would typically be of great utility. >>> >>> Given your list above, I would think that >>> depending >>> on GenomicRanges >>> would >>> often >>> be sufficient, and IRanges/S4Vectors would >>> not >>> require dependency >>> assertion. I would >>> think that GenomeInfoDb should be a voluntary >>> attachment for a specific >>> session. >>> >>> These are just my guesses -- I doubt there >>> will be >>> complete consensus, >>> but >>> I have >>> started to think very critically about using >>> Depends, and I think it is >>> better when its >>> use is minimized. >>> >>> >>> That of course leads to the R CMD check >>> NOTE on >>> depending on too many >>> packages.... I guess I should ignore >>> that one. >>> >>> Best, >>> Kasper >>> >>> [[alternative HTML version >>> deleted]] >>> >>> >>> ___________________________________________________ >>> [email protected] <mailto:[email protected]> >>> <mailto:Bioc-devel@r-project.__org >>> <mailto:[email protected]>> mailing list >>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >>> >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>> >>> >>> [[alternative HTML version >>> deleted]] >>> >>> >>> ___________________________________________________ >>> [email protected] <mailto:[email protected]> >>> <mailto:Bioc-devel@r-project.__org >>> <mailto:[email protected]>> mailing list >>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >>> >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>> >>> >>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ___________________________________________________ >>> [email protected] <mailto:[email protected]> >>> <mailto:Bioc-devel@r-project.__org >>> <mailto:[email protected]>> >>> mailing list >>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel >>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> >>> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >>> <tel:%28206%29%20667-5791> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >>> <tel:%28206%29%20667-1319> >>> >>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: [email protected] <mailto:[email protected]> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >>> >>> >>> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: [email protected] >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
