Re: [R] plotting the regression coefficients

2018-02-11 Thread greg holly
Hi Petr;

Thanks so much. This is great! Although last Sunday, alternatively, I have
solved the problem using the following statement at the very end of the
program.

* ggsave('circle.pdf', p4, height = 70, width = 8, device=pdf, limitsize =
F, dpi=300).*

This works very well too.

Asa my categorical variables are in my Y axis, my R program reorders the
names on Y-axis. However, I would like have and plot output with the names
as they are. Is there any way to have plot without ordering the names of
variables on Y-axis?

Regards,
Greg.

On Mon, Feb 12, 2018 at 10:12 AM, PIKAL Petr  wrote:

> Hi
>
>
>
> Maybe there are other ways but I would split data to several chunks e.g.
> in list and use for cycle to fill multipage pdf.
>
>
>
> With the toy data something like
>
>
>
> library(reshape2)
>
> library(ggplot2)
>
> temp <- melt(temp)
>
> temp.s<-split(temp, cut(1:nrow(temp), 2))
>
>
>
> pdf("temp.pdf")
>
> for (i in 1: length(temp.s)) {
>
> p <- ggplot(temp.s[[i]], aes(x=par1, y=variable, size=abs(value),
> colour=factor(sign(value
>
> print(p+geom_point())
>
> }
>
> dev.off()
>
>
>
> But the real code partly depends on your real data.
>
>
>
> Cheers
>
> Petr
>
>
>
> *From:* greg holly [mailto:mak.hho...@gmail.com]
> *Sent:* Saturday, February 10, 2018 9:05 PM
>
> *To:* PIKAL Petr 
> *Cc:* r-help mailing list 
> *Subject:* Re: [R] plotting the regression coefficients
>
>
>
> Hi Peter;
>
>
>
> The R code you provided works very well. Once again thanks so much for
> this. The number of variables in my data set that should appear on the
> y-axis is 733 and they are not numerical (for example the name of one
> variable is *palmitoyl-arachidonoyl-glycerol (16:0/20:4) [1]**. So, the
> plot looks very messy in one page. How can I make the plot to print out on
> multiple pages?
>
>
>
> Regards,
>
>
>
> Greg
>
>
>
> On Thu, Feb 8, 2018 at 4:33 PM, greg holly  wrote:
>
> Hi Petr;
>
>
>
> Thanks so much. Exactly this is what I need. I will play to change color
> and so on but this backbound is perfect to me. I do appreciate your help
> and support.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Thu, Feb 8, 2018 at 1:29 PM, PIKAL Petr  wrote:
>
> Hi
>
> I copied your values to R, here it is
>
>
>
> > dput(temp)
>
>
>
> temp <- structure(list(par1 = structure(1:4, .Label = c("x1", "x2", "x3",
>
> "x4"), class = "factor"), y1 = c(-0.19, 0.45, -0.09, -0.16),
>
> y2 = c(0.4, -0.75, 0.14, -0.01), y3 = c(-0.06, -8.67, 1.42,
>
> 2.21), y4 = c(0.13, -0.46, 0.06, 0.06)), .Names = c("par1",
>
> "y1", "y2", "y3", "y4"), class = "data.frame", row.names = c(NA,
>
> -4L))
>
>
>
> For plotting it need to be reshaped
>
>
>
> library(reshape2)
>
> library(ggplot2)
>
>
>
> temp <- melt(temp)
>
> p <- ggplot(temp, aes(x=par1, y=variable, size=abs(value),
> colour=factor(sign(value
>
> p+geom_point()
>
>
>
> Is this what you wanted?
>
>
>
> Cheers
>
> Petr
>
> And preferably do not post in HTML, the email content could be scrambled.
>
>
>
> *From:* greg holly [mailto:mak.hho...@gmail.com]
> *Sent:* Thursday, February 8, 2018 9:23 AM
> *To:* PIKAL Petr 
> *Cc:* r-help mailing list 
> *Subject:* Re: [R] plotting the regression coefficients
>
>
>
> Hi Petr;
>
>
>
> Thanks for your reply. It is much appreciated. A small example is given
> below for 4 independent and 4 dependent variables only. The values given
> are regression coefficients.I have looked ggplot documents before writing
> to you. Unfortunately, I could not figure out as my experience in ggplot is
> ignorable
>
>
>
> Regards.
>
> Greg
>
>
>
> y1 y2 y3 y4
>
> x1 -0.19 0.40 -0.06 0.13
>
> x2 0.45 -0.75 -8.67 -0.46
>
> x3 -0.09 0.14 1.42 0.06
>
> x4 -0.16 -0.01 2.21 0.06
>
>
>
>
>
> On Thu, Feb 8, 2018 at 10:19 AM, PIKAL Petr 
> wrote:
>
> Hi
>
> Example, example, example - preferably working.
>
> Wild guess - did you try ggplot?
>
> Cheers
> Petr
>
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of greg
> holly
> > Sent: Thursday, February 8, 2018 8:14 AM
> > To: r-help mailing list 
> > Subject: [R] plotting the regression coefficients
> >
> > Hi Dear all;
> >
> > I would like to create a plot for regression coefficients with each
> independent
> > variable (x) along the side and the phenotypes (y) across the top (as
> given
> > below). For each data point, direction and magnitude of effect could be
> color
> > and significance could be the size of the circle? Is this possible?
> >
> >
> > I would greatly be appreciated your help.
> >
> > Thanks,
> >
> > Greg
> >
> >
> >
> >   y1 y2 y3 y4 y5 y6
> > x1
> > x2
> > x3
> > x4
> > x5
> > x6
> > x7
> > x8
> > x9
> > x10
> > x11
> > x12
> > x13
> > x14
> > x15
> > x16
> > x17
> > .
> > .
> >
>
> > [[alternative HTML version deleted]]
> >
> > 

Re: [R] plotting the regression coefficients

2018-02-11 Thread PIKAL Petr
Hi

Maybe there are other ways but I would split data to several chunks e.g. in 
list and use for cycle to fill multipage pdf.

With the toy data something like

library(reshape2)
library(ggplot2)
temp <- melt(temp)
temp.s<-split(temp, cut(1:nrow(temp), 2))

pdf("temp.pdf")
for (i in 1: length(temp.s)) {
p <- ggplot(temp.s[[i]], aes(x=par1, y=variable, size=abs(value), 
colour=factor(sign(value
print(p+geom_point())
}
dev.off()

But the real code partly depends on your real data.

Cheers
Petr

From: greg holly [mailto:mak.hho...@gmail.com]
Sent: Saturday, February 10, 2018 9:05 PM
To: PIKAL Petr 
Cc: r-help mailing list 
Subject: Re: [R] plotting the regression coefficients

Hi Peter;

The R code you provided works very well. Once again thanks so much for this. 
The number of variables in my data set that should appear on the y-axis is 733 
and they are not numerical (for example the name of one variable is 
palmitoyl-arachidonoyl-glycerol (16:0/20:4) [1]*. So, the plot looks very messy 
in one page. How can I make the plot to print out on multiple pages?

Regards,

Greg

On Thu, Feb 8, 2018 at 4:33 PM, greg holly 
> wrote:
Hi Petr;

Thanks so much. Exactly this is what I need. I will play to change color and so 
on but this backbound is perfect to me. I do appreciate your help and support.

Regards,
Greg

On Thu, Feb 8, 2018 at 1:29 PM, PIKAL Petr 
> wrote:
Hi
I copied your values to R, here it is

> dput(temp)

temp <- structure(list(par1 = structure(1:4, .Label = c("x1", "x2", "x3",
"x4"), class = "factor"), y1 = c(-0.19, 0.45, -0.09, -0.16),
y2 = c(0.4, -0.75, 0.14, -0.01), y3 = c(-0.06, -8.67, 1.42,
2.21), y4 = c(0.13, -0.46, 0.06, 0.06)), .Names = c("par1",
"y1", "y2", "y3", "y4"), class = "data.frame", row.names = c(NA,
-4L))

For plotting it need to be reshaped

library(reshape2)
library(ggplot2)

temp <- melt(temp)
p <- ggplot(temp, aes(x=par1, y=variable, size=abs(value), 
colour=factor(sign(value
p+geom_point()

Is this what you wanted?

Cheers
Petr
And preferably do not post in HTML, the email content could be scrambled.

From: greg holly [mailto:mak.hho...@gmail.com]
Sent: Thursday, February 8, 2018 9:23 AM
To: PIKAL Petr >
Cc: r-help mailing list >
Subject: Re: [R] plotting the regression coefficients

Hi Petr;

Thanks for your reply. It is much appreciated. A small example is given below 
for 4 independent and 4 dependent variables only. The values given are 
regression coefficients.I have looked ggplot documents before writing to you. 
Unfortunately, I could not figure out as my experience in ggplot is ignorable

Regards.
Greg

y1 y2 y3 y4
x1 -0.19 0.40 -0.06 0.13
x2 0.45 -0.75 -8.67 -0.46
x3 -0.09 0.14 1.42 0.06
x4 -0.16 -0.01 2.21 0.06


On Thu, Feb 8, 2018 at 10:19 AM, PIKAL Petr 
> wrote:
Hi

Example, example, example - preferably working.

Wild guess - did you try ggplot?

Cheers
Petr


> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of greg holly
> Sent: Thursday, February 8, 2018 8:14 AM
> To: r-help mailing list >
> Subject: [R] plotting the regression coefficients
>
> Hi Dear all;
>
> I would like to create a plot for regression coefficients with each 
> independent
> variable (x) along the side and the phenotypes (y) across the top (as given
> below). For each data point, direction and magnitude of effect could be color
> and significance could be the size of the circle? Is this possible?
>
>
> I would greatly be appreciated your help.
>
> Thanks,
>
> Greg
>
>
>
>   y1 y2 y3 y4 y5 y6
> x1
> x2
> x3
> x4
> x5
> x6
> x7
> x8
> x9
> x10
> x11
> x12
> x13
> x14
> x15
> x16
> x17
> .
> .
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V 

Re: [R] PSOCK cluster and renice

2018-02-11 Thread Henrik Bengtsson
As a follow up, future 1.7.0 was just released on CRAN allowing you
specify 'renice' as expected.  Example (skip 'dryrun = TRUE' for
actually usage):

> cl <- future::makeClusterPSOCK(2L, renice = 19, dryrun = TRUE)

--
Manually start worker #1 on 'localhost' with:
  nice --adjustment=19 '/usr/lib/R/bin/Rscript'
--default-packages=datasets,utils,grDevices,graphics,stats,methods -e
'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11414 OUT=/dev/null
TIMEOUT=2592000 XDR=TRUE
--
Manually start worker #2 on 'localhost' with:
  nice --adjustment=19 '/usr/lib/R/bin/Rscript'
--default-packages=datasets,utils,grDevices,graphics,stats,methods -e
'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11414 OUT=/dev/null
TIMEOUT=2592000 XDR=TRUE

/Henrik

On Sun, Dec 3, 2017 at 9:06 PM, Andreas Leha
 wrote:
> Hi Henrik,
>
> Thanks for the detailed in fast reply!
>
> My guess would be that the confusion comes from the different use of nice and 
> renice.
>
> The workraund you provided work fine!  Thanks a lot.
>
> Best,
> Andreas
>
>
>
> Henrik Bengtsson  writes:
>
>> Looks like a bug to me due to wrong assumptions about 'nice'
>> arguments, but could be because a "non-standard" 'nice' is used.  If
>> we do:
>>
>>> trace(system, tracer = quote(print(command)))
>> Tracing function "system" in package "base"
>>
>> we see that the system call used is:
>>
>>> cl <- parallel::makePSOCKcluster(2L, renice = 19)
>> Tracing system(cmd, wait = FALSE) on entry
>> [1] "nice +19 '/usr/lib/R/bin/Rscript'
>> --default-packages=datasets,utils,grDevices,graphics,stats,methods -e
>> 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11146 OUT=/dev/null
>> TIMEOUT=2592000 XDR=TRUE"
>> nice: ‘+19’: No such file or directory
>> ^C
>>
>> The code that prepends that 'nice +19' is in parallel:::newPSOCKnode:
>>
>> if (!is.na(renice) && renice)
>> cmd <- sprintf("nice +%d %s", as.integer(renice), cmd)
>>
>> I don't know where that originates from and on what platform it was
>> tests/validated.  On Ubuntu 16.04, CentOS 6.6, and CentOS 7.4, I have
>> 'nice' from "GNU coreutils" and they all complain about using '+',
>> e.g.
>>
>> $ nice +19 date
>> nice: +19: No such file or directory
>>
>> but '-' works:
>>
>> $ nice -19 date
>> Sun Dec  3 20:01:31 PST 2017
>>
>> Neither 'nice --help' nor 'man help' mention the use of a +n option.
>>
>>
>> WORKAROUND:  As a workaround, you can use:
>>
>> cl <- future::makeClusterPSOCK(2L, rscript = c("nice",
>> "--adjustment=10", file.path(R.home("bin"), "Rscript")))
>>
>> which is backward compatible with parallel::makePSOCKcluster() but
>> provides you with more detailed control.  Try adding verbose = TRUE to
>> see what the exact call looks like.
>>
>> /Henrik
>>
>>
>> On Sun, Dec 3, 2017 at 7:35 PM, Andreas Leha
>>  wrote:
>>> Hi all,
>>>
>>> Is it possible to use the 'renice' option together with parallel
>>> clusters of type 'PSOCK'?  The help page for parallel::makeCluster is
>>> not specific about which options are supported on which types and I am
>>> getting the following message when passing renice = 19 :
>>>
 cl <- parallel::makeCluster(2, renice = 19)
>>> nice: ‘+19’: No such file or directory
>>>
>>> Kind regards,
>>> Andreas
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parallel assignments and goto

2018-02-11 Thread Thomas Mailund
I admit I didn’t know about Recall, but you are right, there is no direct 
support for this tail-recursion optimisation. For good reasons — it would break 
a lot of NSE. I am not attempting to solve tail-recursion optimisation for all 
cases. That wouldn’t work by just rewriting functions. It might be doable with 
JIT or something like that, but my goal is less ambitious.

Using local, though, might be an approach. I will play around with that 
tomorrow.

Cheers

On 11 Feb 2018, 18.19 +0100, David Winsemius , wrote:
>
> > On Feb 11, 2018, at 7:48 AM, Thomas Mailund  
> > wrote:
> >
> > Hi guys,
> >
> > I am working on some code for automatically translating recursive functions 
> > into looping functions to implemented tail-recursion optimisations. See 
> > https://github.com/mailund/tailr
> >
> > As a toy-example, consider the factorial function
> >
> > factorial <- function(n, acc = 1) {
> > if (n <= 1) acc
> > else factorial(n - 1, acc * n)
> > }
> >
> > I can automatically translate this into the loop-version
> >
> > factorial_tr_1 <- function (n, acc = 1)
> > {
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > .tailr_n <- n - 1
> > .tailr_acc <- acc * acc
> > n <- .tailr_n
> > acc <- .tailr_acc
> > next
> > }
> > }
> > }
> >
> > which will run faster and not have problems with recursion depths. However, 
> > I’m not entirely happy with this version for two reasons: I am not happy 
> > with introducing the temporary variables and this rewrite will not work if 
> > I try to over-scope an evaluation context.
> >
> > I have two related questions, one related to parallel assignments — i.e. 
> > expressions to variables so the expression uses the old variable values and 
> > not the new values until the assignments are all done — and one related to 
> > restarting a loop from nested loops or from nested expressions in `with` 
> > expressions or similar.
> >
> > I can implement parallel assignment using something like rlang::env_bind:
> >
> > factorial_tr_2 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > next
> > }
> > }
> > }
> >
> > This reduces the number of additional variables I need to one, but is a 
> > couple of orders of magnitude slower than the first version.
> >
> > > microbenchmark::microbenchmark(factorial(100),
> > + factorial_tr_1(100),
> > + factorial_tr_2(100))
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > factorial(100) 53.978 60.543 77.76203 71.0635 85.947 180.251 100
> > factorial_tr_1(100) 9.022 9.903 11.52563 11.0430 11.984 28.464 100
> > factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 
> > 8177.635 100
> >
> >
> > Is there another way to do parallel assignments that doesn’t cost this much 
> > in running time?
> >
> > My other problem is the use of `next`. I would like to combine 
> > tail-recursion optimisation with pattern matching as in 
> > https://github.com/mailund/pmatch where I can, for example, define a linked 
> > list like this:
> >
> > devtools::install_github("mailund/pmatch”)
> > library(pmatch)
> > llist := NIL | CONS(car, cdr : llist)
> >
> > and define a function for computing the length of a list like this:
> >
> > list_length <- function(lst, acc = 0) {
> > force(acc)
> > cases(lst,
> > NIL -> acc,
> > CONS(car, cdr) -> list_length(cdr, acc + 1))
> > }
> >
> > The `cases` function creates an environment that binds variables in a 
> > pattern-description that over-scopes the expression to the right of `->`, 
> > so the recursive call in this example have access to the variables `cdr` 
> > and `car`.
> >
> > I can transform a `cases` call to one that creates the environment 
> > containing the bound variables and then evaluate this using `eval` or 
> > `with`, but in either case, a call to `next` will not work in such a 
> > context. The expression will be evaluated inside `bind` or `with`, and not 
> > in the `list_lenght` function.
> >
> > A version that *will* work, is something like this
> >
> > factorial_tr_3 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > .tailr_frame <- rlang::current_frame()
> > repeat {
> > if (n <= 1)
> > rlang::return_from(.tailr_frame, acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > rlang::return_to(.tailr_frame)
> > }
> > }
> > }
> >
> > Here, again, for the factorial function since this is easier to follow than 
> > the list-length function.
> >
> > This solution will also work if you return values from inside loops, where 
> > `next` wouldn’t work either.
> >
> > Using `rlang::return_from` and `rlang::return_to` implements the right 
> > semantics, but costs me another order of magnitude in running time.
> >
> > microbenchmark::microbenchmark(factorial(100),
> > factorial_tr_1(100),
> > factorial_tr_2(100),
> > factorial_tr_3(100))
> > Unit: 

Re: [R] Parallel assignments and goto

2018-02-11 Thread David Winsemius

> On Feb 11, 2018, at 7:48 AM, Thomas Mailund  wrote:
> 
> Hi guys,
> 
> I am working on some code for automatically translating recursive functions 
> into looping functions to implemented tail-recursion optimisations. See 
> https://github.com/mailund/tailr
> 
> As a toy-example, consider the factorial function
> 
> factorial <- function(n, acc = 1) {
>if (n <= 1) acc
>else factorial(n - 1, acc * n)
> }
> 
> I can automatically translate this into the loop-version
> 
> factorial_tr_1 <- function (n, acc = 1) 
> {
>repeat {
>if (n <= 1) 
>return(acc)
>else {
>.tailr_n <- n - 1
>.tailr_acc <- acc * acc
>n <- .tailr_n
>acc <- .tailr_acc
>next
>}
>}
> }
> 
> which will run faster and not have problems with recursion depths. However, 
> I’m not entirely happy with this version for two reasons: I am not happy with 
> introducing the temporary variables and this rewrite will not work if I try 
> to over-scope an evaluation context.
> 
> I have two related questions, one related to parallel assignments — i.e. 
> expressions to variables so the expression uses the old variable values and 
> not the new values until the assignments are all done — and one related to 
> restarting a loop from nested loops or from nested expressions in `with` 
> expressions or similar.
> 
> I can implement parallel assignment using something like rlang::env_bind:
> 
> factorial_tr_2 <- function (n, acc = 1) 
> {
>.tailr_env <- rlang::get_env()
>repeat {
>if (n <= 1) 
>return(acc)
>else {
>rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>next
>}
>}
> }
> 
> This reduces the number of additional variables I need to one, but is a 
> couple of orders of magnitude slower than the first version.
> 
>> microbenchmark::microbenchmark(factorial(100),
> +factorial_tr_1(100),
> +factorial_tr_2(100))
> Unit: microseconds
> expr  min   lq   meanmedian   uq  
> max neval
>  factorial(100)   53.978   60.543   77.76203   71.0635   85.947  180.251  
>  100
> factorial_tr_1(100)9.0229.903   11.52563   11.0430   11.984   28.464  
>  100
> factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 8177.635  
>  100
> 
> 
> Is there another way to do parallel assignments that doesn’t cost this much 
> in running time?
> 
> My other problem is the use of `next`. I would like to combine tail-recursion 
> optimisation with pattern matching as in https://github.com/mailund/pmatch 
> where I can, for example, define a linked list like this:
> 
> devtools::install_github("mailund/pmatch”)
> library(pmatch)
> llist := NIL | CONS(car, cdr : llist)
> 
> and define a function for computing the length of a list like this:
> 
> list_length <- function(lst, acc = 0) {
>  force(acc)
>  cases(lst,
>NIL -> acc,
>CONS(car, cdr) -> list_length(cdr, acc + 1))
> }
> 
> The `cases` function creates an environment that binds variables in a 
> pattern-description that over-scopes the expression to the right of `->`, so 
> the recursive call in this example have access to the variables `cdr` and 
> `car`.
> 
> I can transform a `cases` call to one that creates the environment containing 
> the bound variables and then evaluate this using `eval` or `with`, but in 
> either case, a call to `next` will not work in such a context. The expression 
> will be evaluated inside `bind` or `with`, and not in the `list_lenght` 
> function.
> 
> A version that *will* work, is something like this
> 
> factorial_tr_3 <- function (n, acc = 1) 
> {
>.tailr_env <- rlang::get_env()
>.tailr_frame <- rlang::current_frame()
>repeat {
>if (n <= 1) 
>rlang::return_from(.tailr_frame, acc)
>else {
>rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>rlang::return_to(.tailr_frame)
>}
>}
> }
> 
> Here, again, for the factorial function since this is easier to follow than 
> the list-length function.
> 
> This solution will also work if you return values from inside loops, where 
> `next` wouldn’t work either.
> 
> Using `rlang::return_from` and `rlang::return_to` implements the right 
> semantics, but costs me another order of magnitude in running time.
> 
> microbenchmark::microbenchmark(factorial(100),
>   factorial_tr_1(100),
>   factorial_tr_2(100),
>   factorial_tr_3(100))
> Unit: microseconds
>expr   min lqmean medianuq 
>max neval
>  factorial(100)52.47960.264093.4306967.513083.925   
> 2062.481   100
> factorial_tr_1(100) 8.875 9.652549.1959510.694511.217   
> 3818.823   100
> 

Re: [R] Hausman test

2018-02-11 Thread David Winsemius

> On Feb 11, 2018, at 8:29 AM, PAOLO PILI  wrote:
> 
> you are right about the 3rd line but it doesn't help me for my problem. I
> remove the 3rd line but there is still the same problem:
> 
> Error in solve.default (dvcov):
>   the system is numerically unique: reciprocity condition value =
> 1.63418e-19

That suggests inclusion of too many categorical (factor) variables relative to 
the sample size in the predictor variables. Use tabular methods to investigate. 
Unable to be more specific in the absence of a proper description of the data 
situation.

-- 
David.
> 
> Paolo
> 
> 2018-02-11 16:54 GMT+01:00 Bert Gunter :
> 
>> Note the typo in your 3rd line: data <
>> 
>> Don't  know if this means anything...
>> 
>> Bert
>> 
>> 
>> 
>> On Feb 11, 2018 7:33 AM, "PAOLO PILI"  wrote:
>> 
>>> Hello,
>>> 
>>> I have a problem with Hausman test. I am performing my analysis with these
>>> commands:
>>> 
 library(plm)
 data<-read.csv2("paolo.csv",header=TRUE)
 data<
>>> pdata.frame(data,index=c("FIRM","YEAR"),drop.index=TRUE,row.names=TRUE)
 
>>> RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TU
>>> RN+GPROF+GPROF2
 
>>> grun.fe<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROW
>>> TH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
 grun.re
>>> <-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGRO
>>> WTH+TURN+GPROF+GPROF2,data=data,model="random")
 
>>> gw<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+
>>> NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
 
>>> gr<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+
>>> NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
 phtest(gw,gr)
>>> 
>>> I got this answer:
>>> 
>>> Error in solve.default(dvcov) :
>>> 
>>> how can I solve this problem?
>>> 
>>> Thank you
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hausman test

2018-02-11 Thread PAOLO PILI
you are right about the 3rd line but it doesn't help me for my problem. I
remove the 3rd line but there is still the same problem:

Error in solve.default (dvcov):
   the system is numerically unique: reciprocity condition value =
1.63418e-19

Paolo

2018-02-11 16:54 GMT+01:00 Bert Gunter :

> Note the typo in your 3rd line: data <
>
> Don't  know if this means anything...
>
> Bert
>
>
>
> On Feb 11, 2018 7:33 AM, "PAOLO PILI"  wrote:
>
>> Hello,
>>
>> I have a problem with Hausman test. I am performing my analysis with these
>> commands:
>>
>> > library(plm)
>> > data<-read.csv2("paolo.csv",header=TRUE)
>> > data<
>> pdata.frame(data,index=c("FIRM","YEAR"),drop.index=TRUE,row.names=TRUE)
>> >
>> RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TU
>> RN+GPROF+GPROF2
>> >
>> grun.fe<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROW
>> TH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
>> > grun.re
>> <-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGRO
>> WTH+TURN+GPROF+GPROF2,data=data,model="random")
>> >
>> gw<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+
>> NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
>> >
>> gr<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+
>> NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
>> > phtest(gw,gr)
>>
>> I got this answer:
>>
>> Error in solve.default(dvcov) :
>>
>> how can I solve this problem?
>>
>> Thank you
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hausman test

2018-02-11 Thread Bert Gunter
Note the typo in your 3rd line: data <

Don't  know if this means anything...

Bert



On Feb 11, 2018 7:33 AM, "PAOLO PILI"  wrote:

> Hello,
>
> I have a problem with Hausman test. I am performing my analysis with these
> commands:
>
> > library(plm)
> > data<-read.csv2("paolo.csv",header=TRUE)
> > data<
> pdata.frame(data,index=c("FIRM","YEAR"),drop.index=TRUE,row.names=TRUE)
> >
> RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+
> TURN+GPROF+GPROF2
> >
> grun.fe<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+
> PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
> > grun.re
> <-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+
> NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
> >
> gw<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+
> PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
> >
> gr<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+
> PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
> > phtest(gw,gr)
>
> I got this answer:
>
> Error in solve.default(dvcov) :
>
> how can I solve this problem?
>
> Thank you
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parallel assignments and goto

2018-02-11 Thread Thomas Mailund
Hi guys,

I am working on some code for automatically translating recursive functions 
into looping functions to implemented tail-recursion optimisations. See 
https://github.com/mailund/tailr

As a toy-example, consider the factorial function

factorial <- function(n, acc = 1) {
if (n <= 1) acc
else factorial(n - 1, acc * n)
}

I can automatically translate this into the loop-version

factorial_tr_1 <- function (n, acc = 1) 
{
repeat {
if (n <= 1) 
return(acc)
else {
.tailr_n <- n - 1
.tailr_acc <- acc * acc
n <- .tailr_n
acc <- .tailr_acc
next
}
}
}

which will run faster and not have problems with recursion depths. However, I’m 
not entirely happy with this version for two reasons: I am not happy with 
introducing the temporary variables and this rewrite will not work if I try to 
over-scope an evaluation context.

I have two related questions, one related to parallel assignments — i.e. 
expressions to variables so the expression uses the old variable values and not 
the new values until the assignments are all done — and one related to 
restarting a loop from nested loops or from nested expressions in `with` 
expressions or similar.

I can implement parallel assignment using something like rlang::env_bind:

factorial_tr_2 <- function (n, acc = 1) 
{
.tailr_env <- rlang::get_env()
repeat {
if (n <= 1) 
return(acc)
else {
rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
next
}
}
}

This reduces the number of additional variables I need to one, but is a couple 
of orders of magnitude slower than the first version.

> microbenchmark::microbenchmark(factorial(100),
+factorial_tr_1(100),
+factorial_tr_2(100))
Unit: microseconds
 expr  min   lq   meanmedian   uq  
max neval
  factorial(100)   53.978   60.543   77.76203   71.0635   85.947  180.251   
100
 factorial_tr_1(100)9.0229.903   11.52563   11.0430   11.984   28.464   
100
 factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 8177.635   
100


Is there another way to do parallel assignments that doesn’t cost this much in 
running time?

My other problem is the use of `next`. I would like to combine tail-recursion 
optimisation with pattern matching as in https://github.com/mailund/pmatch 
where I can, for example, define a linked list like this:

devtools::install_github("mailund/pmatch”)
library(pmatch)
llist := NIL | CONS(car, cdr : llist)

and define a function for computing the length of a list like this:

list_length <- function(lst, acc = 0) {
  force(acc)
  cases(lst,
NIL -> acc,
CONS(car, cdr) -> list_length(cdr, acc + 1))
}

The `cases` function creates an environment that binds variables in a 
pattern-description that over-scopes the expression to the right of `->`, so 
the recursive call in this example have access to the variables `cdr` and `car`.

I can transform a `cases` call to one that creates the environment containing 
the bound variables and then evaluate this using `eval` or `with`, but in 
either case, a call to `next` will not work in such a context. The expression 
will be evaluated inside `bind` or `with`, and not in the `list_lenght` 
function.

A version that *will* work, is something like this

factorial_tr_3 <- function (n, acc = 1) 
{
.tailr_env <- rlang::get_env()
.tailr_frame <- rlang::current_frame()
repeat {
if (n <= 1) 
rlang::return_from(.tailr_frame, acc)
else {
rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
rlang::return_to(.tailr_frame)
}
}
}

Here, again, for the factorial function since this is easier to follow than the 
list-length function.

This solution will also work if you return values from inside loops, where 
`next` wouldn’t work either.

Using `rlang::return_from` and `rlang::return_to` implements the right 
semantics, but costs me another order of magnitude in running time.

microbenchmark::microbenchmark(factorial(100),
   factorial_tr_1(100),
   factorial_tr_2(100),
   factorial_tr_3(100))
Unit: microseconds
expr   min lqmean medianuq  
  max neval
  factorial(100)52.47960.264093.4306967.513083.925   
2062.481   100
 factorial_tr_1(100) 8.875 9.652549.1959510.694511.217   
3818.823   100
 factorial_tr_2(100)  5296.350  5525.0745  5973.77664  5737.8730  6260.128   
8471.301   100
 factorial_tr_3(100) 77554.457 80757.0905 87307.28737 84004.0725 89859.169 
171039.228   100

I can live with the “introducing extra variables” solution to parallel 
assignment, and I could hack my way out of using `with` or `bind` 

[R] Hausman test

2018-02-11 Thread PAOLO PILI
Hello,

I have a problem with Hausman test. I am performing my analysis with these
commands:

> library(plm)
> data<-read.csv2("paolo.csv",header=TRUE)
> data<
pdata.frame(data,index=c("FIRM","YEAR"),drop.index=TRUE,row.names=TRUE)
>
RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TURN+GPROF+GPROF2
>
grun.fe<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
> grun.re
<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
>
gw<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="within")
>
gr<-plm(RECEIV~LSIZE+LAGE+LAGE2+CFLOW+STLEV+FCOST+PGROWTH+NGROWTH+TURN+GPROF+GPROF2,data=data,model="random")
> phtest(gw,gr)

I got this answer:

Error in solve.default(dvcov) :

how can I solve this problem?

Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Resumen de R-help-es, Vol 108, Envío 11

2018-02-11 Thread Andrés Hirigoyen
Gracias, pruebo ambas opciones.
Un saludo

El 11 de febrero de 2018, 08:00,  escribió:

> Envíe los mensajes para la lista R-help-es a
> r-help-es@r-project.org
>
> Para subscribirse o anular su subscripción a través de la WEB
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
> O por correo electrónico, enviando un mensaje con el texto "help" en
> el asunto (subject) o en el cuerpo a:
> r-help-es-requ...@r-project.org
>
> Puede contactar con el responsable de la lista escribiendo a:
> r-help-es-ow...@r-project.org
>
> Si responde a algún contenido de este mensaje, por favor, edite la
> linea del asunto (subject) para que el texto sea mas especifico que:
> "Re: Contents of R-help-es digest...". Además, por favor, incluya en
> la respuesta sólo aquellas partes del mensaje a las que está
> respondiendo.
>
>
> Asuntos del día:
>
>1. Optimizar función (Andrés Hirigoyen)
>2. Re: Optimizar función (Álvaro Hernández)
>3. Re: Optimizar función (Carlos J. Gil Bellosta)
>
> --
>
> Message: 1
> Date: Sat, 10 Feb 2018 13:09:22 -0300
> From: Andrés Hirigoyen 
> To: Lista R 
> Subject: [R-es] Optimizar función
> Message-ID:
>  mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Buenas para tod@s, tengo una consulta para poder optimizar tiempos.
> Ejemplo
> tengo el siguiente dataframe:
>
> distrito<-c("A","A","A","B","B","B","C","C","C","A","A","B","B","C")
> Sex<-c("M","F","M","F","M","F","M","F","M","F","M","F","M","F")
> Edad<-c(25,36,25,25,25,19,36,39,36,65,54,25,28,28)
> Ingreso<-c(125,365,265,987,690,369,325,369,789,854,254,268,698,258)
> Aporte <- c(3,6,3,6,9,6,9,7,9,7,4,8,2,8)
> datos<-data.frame(distrito=distrito,Sex=Sex,Edad=Edad,
> Ingreso=Ingreso,Aporte=Aporte)
>
> Quiero aplicar la function *summarise *del paquete *dplyr *a las 3
> variables númericas.
> Para la variable Aporte por ejemplo:
>
> descrip<-function(data) {
>   grupos <- group_by(data, distrito)
> result <-
> summarise(grupos,
>   media = mean(Aporte),
>   maximo = max(Aporte),
>   minimo = min(Aporte),
>   desvio= sd(Aporte)
> )
> return(result)
> }
>
> Pero me gustaría automatizarla para que corra para todas las variables del
> dataframe  (3 en este caso pero van a ser mas de 23).
> Sugerencias???
>
> Muchas gracias
> --
>
> [[alternative HTML version deleted]]
>
>
>
>
> --
>
> Message: 2
> Date: Sat, 10 Feb 2018 17:41:00 +0100
> From: Álvaro Hernández 
> To: r-help-es@r-project.org
> Subject: Re: [R-es] Optimizar función
> Message-ID: 
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> Hola, Andrés:
>
> Con dplyr tienes todas las variantes de 'summarise' para hacer lo que
> dices. Por ejemplo, puedes elegir en qué variables quieres aplicarlo con
> 'summarise_at' o definir una condición con 'summarise_if'.
>
>   * Para aplicar las funciones que definas sobre esas tres variables:
>
> datos %>%
>group_by(distrito) %>%
>summarise_at(vars(Aporte, Ingreso, Edad),
> funs(media = mean, maximo = max, minimo = min, desvio =
> sd))
>
> * Para aplicar esas funciones sobre todas las variables numéricas que
> tengas:
>
> datos %>%
>group_by(distrito) %>%
>summarise_if(is.numeric,
> funs(media = mean, maximo = max, minimo = min, desvio =
> sd))
>
> Un saludo
> Álvaro
>
> El 10/02/18 a las 17:09, Andrés Hirigoyen escribió:
> > Buenas para tod@s, tengo una consulta para poder optimizar tiempos.
> Ejemplo
> > tengo el siguiente dataframe:
> >
> > distrito<-c("A","A","A","B","B","B","C","C","C","A","A","B","B","C")
> > Sex<-c("M","F","M","F","M","F","M","F","M","F","M","F","M","F")
> > Edad<-c(25,36,25,25,25,19,36,39,36,65,54,25,28,28)
> > Ingreso<-c(125,365,265,987,690,369,325,369,789,854,254,268,698,258)
> > Aporte <- c(3,6,3,6,9,6,9,7,9,7,4,8,2,8)
> > datos<-data.frame(distrito=distrito,Sex=Sex,Edad=Edad,
> Ingreso=Ingreso,Aporte=Aporte)
> >
> > Quiero aplicar la function *summarise *del paquete *dplyr *a las 3
> > variables númericas.
> > Para la variable Aporte por ejemplo:
> >
> > descrip<-function(data) {
> >grupos <- group_by(data, distrito)
> >  result <-
> >  summarise(grupos,
> >media = mean(Aporte),
> >maximo = max(Aporte),
> >minimo = min(Aporte),
> >desvio= sd(Aporte)
> >  )
> >  return(result)
> > }
> >
> > Pero me gustaría automatizarla para que corra para todas las variables
> del
> > dataframe  (3 en este caso pero van a ser mas de 23).
> > Sugerencias???
> >
> > Muchas gracias
> > --
> >
> >   [[alternative HTML version deleted]]
> >
> >