Martin Morgan wrote: > On 04/08/2017 12:29 PM, Aaron Lun wrote: >> Martin Morgan wrote: >>> On 04/08/2017 08:16 AM, Aaron Lun wrote: >>>> On 07/04/17 20:46, luke-tier...@uiowa.edu wrote: >>>>> On Fri, 7 Apr 2017, Hervé Pagès wrote: >>>>> >>>>>> On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote: >>>>>>> On Fri, 7 Apr 2017, Hervé Pagès wrote: >>>>>>> >>>>>>>> On 04/06/2017 03:29 AM, Michael Lawrence wrote: >>>>>>>>> On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan >>>>>>>>> <martin.mor...@roswellpark.org> wrote: >>>>>>>>>> On 04/06/2017 05:33 AM, Aaron Lun wrote: >>>>>>>>>>>>>>>>> The tool is not perfect, so assess each report >>>>>>> carefully. >>>>>>>>> I get a lot of warnings because the tool seems to consider that >>>>>>>> extracting an attribute (with getAttrib(x, ...)) or extracting >>>>>>>> the >>>>>>>> slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...)) >>>>>>>> returns an SEXP that needs protection. I always assumed that it >>>>>>>> didn't because my understanding is that the returned SEXP is >>>>>>>> pointing >>>>>>>> to a part of a pre-existing object ('x') and not to a newly >>>>>>>> created >>>>>>>> one. So I decided I could treat it like the SEXP returned by >>>>>>>> VECTOR_ELT(), which, AFAIK, doesn't need protection. >>>>>>>>> So I suspect these warnings are false positives but I'm not 100% >>>>>>> sure. >>>>>>> >>>>>>> If you are not 100% sure then you should protect :-) >>>>>>> >>>>>>> There are some cases, in particular related to compact row >>>>>>> names on >>>>>>> data frames, where getAttrib will allocate. >>>>>> >>>>>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a >>>>>> no-op >>>>>> anymore? Should I worry that VECTOR_ELT() will also expand some sort >>>>>> of compact list element? Why not keep these things low-level >>>>>> getters/setters that return whatever the real thing is and use >>>>>> higher-level accessors for returning the expanded version of the >>>>>> thing? >>>>> >>>>> Seriously: it's been that way since r37807 in 2006. >>>>> >>>>> If you want to read about some related future directions you can >>>>> look at >>>>> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. >>>>> >>>>> luke >>>> >>>> I was curious about this so I checked out what R-exts had to say >>>> involving set/getAttrib. Funnily enough, the example it gives in >>>> Section >>>> 5.9.4 seems to be incorrect in its UNPROTECTing. >>>> >>>> #include <R.h> >>>> #include <Rinternals.h> >>>> >>>> SEXP out(SEXP x, SEXP y) >>>> { >>>> int nx = length(x), ny = length(y); >>>> SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny)); >>>> double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans); >>>> >>>> for(int i = 0; i < nx; i++) { >>>> double tmp = rx[i]; >>>> for(int j = 0; j < ny; j++) >>>> rans[i + nx*j] = tmp * ry[j]; >>>> } >>>> >>>> SEXP dimnames = PROTECT(allocVector(VECSXP, 2)); >>>> SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol)); >>>> SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol)); >>>> setAttrib(ans, R_DimNamesSymbol, dimnames); >>>> >>>> >>>> UNPROTECT(3); >>>> return ans; >>>> } >>>> >>>> There are two PROTECT calls but an UNPROTECT(3), which triggers a >>>> stack >>>> imbalance warning upon trying to run .Call("out", ...) in R. >>> >>> Yes, that should be UNPROTECT(2). svn blame says the error was >>> introduced when allocMatrix() was introduced; prior to that the code >>> had allocVector(), then set dim and dimnames. >>> >>> As for whether to PROTECT or not, my analysis would be... >>> >>> SET_VECTOR_ELT does not (currently) allocate (except on error) so >>> there is no opportunity for the garbage collector to run, hence no >>> need to PROTECT. >>> >>> Further, getAttrib() (currently) allocates only if (1) the attribute >>> is R_RowNamesSymbol and the row names are stored in compact format >>> c(NA_integer_, nrow); or (2) the first argument is a classic pairlist >>> or language SEXP. None of these conditions apply, so the return value >>> of getAttrib() is PROTECTed anyway. >>> >>> Luke's analysis would be more straight-forward: if in doubt, PROTECT. >>> >>> I think Herve, Gabe, and perhaps Michael would take Luke's advice, and >>> maybe also note that my advice, in addition to being an analysis of >>> some surprisingly complicated code by a practitioner of dubious >>> credibility, involves the current state of affairs, and you never know >>> (and apparently ALTREP makes it less certain) what the future will >>> hold. So they'd probably say PROTECT. >>> >>> One might be tempted to write >>> >>> SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol))); >>> >>> but I'm not sure whether C guarantees that function arguments are >>> fully evaluated, maybe it's legal for a compiler to evaluate >>> getAttrib(), then do some more work with other arguments, then >>> evaluate PROTECT(), so long as the overall logic is not disrupted. So >>> the 'if in doubt' argument would make me write >>> >>> SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol)); >>> SET_VECTOR_ELT(dimnames, 0, nms); >>> >>> I think , in C is called a 'sequence point'. Google takes me to >>> >>> https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx >>> >>> where it seems like the left operand of ',' are fully evaluated before >>> proceeding, and furthermore that arguments to functions are evaluated >>> before entering the function, implying that it is safe to use the >>> one-liner >>> >>> SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol))); >>> >>> At any rate I changed the example in R-exts to UNPROTECT(2), leaving >>> the nuances for further discussion. >>> >>> Martin >> >> I wonder if the following is a sensible idea: >> >> int Rf_num_protected; // global variable >> void Rf_start_protection() { >> Rf_num_protected=0; >> return; >> } >> SEXP Rf_add_protection(SEXP x) { >> ++Rf_num_protected; >> return PROTECT(x); >> } >> void Rf_end_protection() { >> UNPROTECT(Rf_num_protected); >> return; >> } >> >> The idea would be to: >> >> 1. call Rf_start_protection() at the top of the native routine >> 2. replace all uses of PROTECT with Rf_add_protection >> 3. call Rf_end_protection() just before returning to R >> >> This would avoid having to keep track of the number of PROTECTs >> performed, which may not be trivial if the routine can return at >> multiple points. >> >> It might also useful for C++ native routine creating class instances >> that need to do internal PROTECTs for the lifetime of the instance. As >> long as those PROTECTs are done via Rf_add_protection(), a single >> Rf_end_protection() call at the bottom of the top-level routine would be >> sufficient to handle them all. In contrast, putting a matching UNPROTECT >> in the class destructor is not safe, as it is possible to trigger the >> destructor to UNPROTECT an unrelated SEXP: >> >> SEXP blah(SEXP x) { >> my_class* ptr=new my_class(x); // say this does an internal PROTECT >> SEXP output=PROTECT(allocVector(INTSXP, 1)); >> // ... do something with output here... >> delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs >> output instead >> // ... do some more stuff, possibly involving allocations ... >> UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class >> return output; >> } >> > > Global variables are problematic to reason about, e.g., in nested > calls or parallel code sections. > > 'ad hoc' (no offense intended) solutions often increase rather than > reduce cognitive burden, because someone new to the code (including > one's future self) has to parse the intention and validate use. > > Rcpp seems like the right approach for C++ code; it largely removes > the need for explicit PROTECTion management, and is widely used and > responsibly maintained so the edge cases / tricky problems get > discovered and addressed. > > Martin
Yes, I was wondering whether I should transition all of my C++ code to Rcpp. It would require a decent amount of effort involving ~7 packages, so I've held off from doing it... but if I have to go through the code to fix all of the PROTECTs anyway, I might as well go all the way and switch to using Rcpp. Sounds like a small project for my Easter break. -Aaron >>>> >>>> Anyway, getting back to the topic of this thread; if we were to >>>> pretend >>>> that getAttrib() allocates in the above example, would that mean that >>>> both getAttrib() calls now need to be PROTECTed by the developer? >>>> Or is >>>> this handled somewhere internally? >>>> >>>>>> >>>>>> Thanks, >>>>>> H. >>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> luke >>>>>>> >>>>>>>>>>>>>>>>>>>> I also get a warning on almost every C++ >>>>>>> function I've written, >>>>>>>>>>> because >>>>>>>>>>> I use the following code to handle exceptions: >>>>>>>>>>>>>>> SEXP output=PROTECT(allocVector(...)); >>>>>>>>>>> try { >>>>>>>>>>> // do something that might raise an exception >>>>>>>>>>> } catch (std::exception& e) { >>>>>>>>>>> UNPROTECT(1); >>>>>>>>>>> throw; // break out of this part of the function >>>>>>>>>>> } >>>>>>>>>>> UNPROTECT(1); >>>>>>>>>>> return output; >>>>>>>>>>>>>>> Presumably the check doesn't account for transfer of >>>>>>> control to > > > > the >>>>>>>>>>> catch block. I find that R itself is pretty good at >>>>>>> complaining > > > > about >>>>>>>>>>> stack imbalances during execution of tests, examples, etc. >>>>>>>>>>>>>>>> 'My' packages >>>>>>>>>>>> (Rsamtools, DirichletMultinomial) had several false >>>>>>> positives > > > > > (all >>>>>>>>>>>> associated with use of an attribute of a protected SEXP), >>>>>>> one > > > > > subtle >>>>>>>>>>>> problem (a symbol from a PROTECT'ed package name space; >>>>>>> the > > > > > symbol >>>>>>>>>>>> could >>>>>>>>>>>> in theory be an active binding and the value obtained not >>>>>>>>>>>> PROTECTed by >>>>>>>>>>>> the name space), and a genuine bug >>>>>>>>>>>>>>>>> tag = NEW_CHARACTER(n); >>>>>>>>>>>> for (int j = 0; j < n; ++j) >>>>>>>>>>>> SET_STRING_ELT(tag, j, NA_STRING); >>>>>>>>>>>> if ('A' == aux[0]) { >>>>>>>>>>>> buf_A = R_alloc(2, sizeof(char)); # >>>>>>> <<- bug >>>>>>>>>>>> buf_A[1] = '\0'; >>>>>>>>>>>> } >>>>>>>>>>>> ... >>>>>>>>>>>> SET_VECTOR_ELT(tags, i, tag); # PROTECT >>>>>>> tag, too >>>>>>>>>>>> late! >>>>>>>>>>>>>>>>>>> I assume the bug refers to the un-PROTECT'd >>>>>>> NEW_CHARACTER here - > > > > the >>>>>>>>>>> R_alloc call looks okay to me... >>>>>>>>>>>>>>>> yes, tag needs protection. >>>>>>>>>>>>> I attributed the bug to R_alloc because I failed to >>>>>>> reason that > > > R_alloc >>>>>>>>>> (obviously) allocates and hence can trigger a garbage >>>>>>>>>> collection. >>>>>>>>>>>>> Somehow it reflects my approach to PROTECTion, probably >>>>>>> not shared > > > by >>>>>>>>>> everyone. I like to PROTECT only when necessary, rather than >>>>>>>>>> indiscriminately. Probably this has no practical consequence in >>>>>>>>>> terms of >>>>>>>>>> performance, making the code a little easier to read at the >>>>>>> expense > > > of >>>>>>>>>> exposing me to bugs like this. >>>>>>>>>>>>>> I guess it's a tradeoff between syntactic complexity >>>>>>> and logical >>>>>>>>> complexity. You have to think pretty hard to minimize use of the >>>>>>>>> protect stack. >>>>>>>>> I prefer to call it logical obscurity ;-) >>>>>>>>> The hard thinking consists in assessing whether or not the code >>>>>>> between >>>>>>>> the line where a new SEXP is allocated (line X) and the line >>>>>>>> where >>>>>>>> it's put in a safe place (line Y) can trigger garbage collection. >>>>>>>> Hard to figure out in general indeed, but not impossible! Problem >>>>>>>> is that the result of this assessment is valid at a certain point >>>>>>>> in time but might change in the future, even if your code has not >>>>>>>> changed. >>>>>>>>> So a dangerous game for virtually zero benefits. >>>>>>>>>>>> One recommendation might be to UNPROTECT() as soon as the >>>>>>> pointer on >>>>>>>>> the top is unneeded, rather than trying to figure out the >>>>>>>>> number to >>>>>>>>> pop just before returning to R. >>>>>>>>> If you PROTECT() in a loop, you definitely want to do that. >>>>>>> Otherwise, >>>>>>>> does it make a big difference? >>>>>>>>>>>> One thing that got me is that the order in which C >>>>>>> evaluates function >>>>>>>>> call arguments is undefined. I did a lot of R_setAttrib(x, >>>>>>>>> install("foo"), allocBar()), thinking that the symbol would be >>>>>>>>> automatically protected, and allocBar() would not need >>>>>>>>> protection, >>>>>>>>> since it happened last. Unfortunately, it might be evaluated >>>>>>>>> first. >>>>>>>>> I got hit by this too long time ago but with defineVar() >>>>>>>>> instead of >>>>>>>> R_setAttrib(): >>>>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e= >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> H. >>>>>>>>>>>> Btw, I think my package RGtk2 got the record: 1952 errors. >>>>>>> Luckily >>>>>>>>> almost all of them happened inside a few macros and >>>>>>>>> autogenerated >>>>>>>>> code. >>>>>>>>>>>> Martin >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Aaron >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e= >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This email message may contain >>>>>>> legally privileged >>>>>>>>>> and/or...{{dropped:2}} >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e= >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e= >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>>>> >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> >>> >>> This email message may contain legally privileged and/or confidential >>> information. If you are not the intended recipient(s), or the >>> employee or agent responsible for the delivery of this message to the >>> intended recipient(s), you are hereby notified that any disclosure, >>> copying, distribution, or use of this email message is prohibited. If >>> you have received this message in error, please notify the sender >>> immediately by e-mail and delete this email message from your >>> computer. Thank you. >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the > employee or agent responsible for the delivery of this message to the > intended recipient(s), you are hereby notified that any disclosure, > copying, distribution, or use of this email message is prohibited. If > you have received this message in error, please notify the sender > immediately by e-mail and delete this email message from your > computer. Thank you. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel