Re: [R] Reshaping an array - how does it work in R

2016-03-23 Thread Martin Maechler
> Dénes Tóth 
> on Tue, 22 Mar 2016 10:55:58 +0100 writes:

> Hi Martin,


> On 03/22/2016 10:20 AM, Martin Maechler wrote:
>>> >Dénes Tóth
>>> > on Fri, 18 Mar 2016 22:56:23 +0100 writes:
>> > Hi Roy,
>> > R (usually) makes a copy if the dimensionality of an array is modified,
>> > even if you use this syntax:
>> 
>> > x <- array(1:24, c(2, 3, 4))
>> > dim(x) <- c(6, 4)
>> 
>> > See also ?tracemem, ?data.table::address, ?pryr::address and other 
tools
>> > to trace if an internal copy is done.
>> 
>> Well, without using strange (;-) packages,  indeed standard R's
>> tracemem(), notably the help page is a good pointer.
>> 
>> According to the help page memory tracing is enabled in the
>> default R binaries for Windows and OS X.
>> For Linux (where I, as R developer, compile R myself anyway),
>> one needs to configure with --enable-memory-profiling .
>> 
>> Now, let's try:
>> 
>> > x <- array(rnorm(47), dim = c(1000,50, 40))
>> > tracemem(x)
>> [1] "<0x7f79a498a010>"
>> > dim(x) <- c(1000* 50, 40)
>> > x[5] <- pi
>> > tracemem(x)
>> [1] "<0x7f79a498a010>"
>> >
>> 
>> So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
>> *NOT*  make a copy.

> This is interesting. First I wanted to demonstrate to Roy that recent R 
> versions are smart enough not to make any copy during reshaping an 
> array. Then I put together an example (similar to yours) and realized 
> that after several reshapes, R starts to copy the array. So I had to 
> modify my suggestion... And now, I realized that this was an 
> RStudio-issue. At least on Linux, a standard R terminal behaves as you 
> described, however, RStudio (version 0.99.862, which is not the very 
> latest) tends to create copies (quite randomly, at least to me). If I 
> have time I will test this more thoroughly and file a report to RStudio 
> if it turns out to be a bug.

Interesting, indeed.

I can confirm the bugous  Rstudio behavior
using the latest version of Rstudio (64 bit Linux, Fedora 22)
  RStudio Version 0.99.891 – © 2009-2016 RStudio, Inc.

The attached small R script is very transparent in demonstrating
the problem.
If you have a tracemem-enabled version of R, the output is even
more revealing, inside Rstudio it gives

> showAdr <- function(x) {
+ if(capabilities("profmem")) {
+ tracemem(x)
+ } else {
+ cat("R version not configured for memory tracing\n")
+ .Internal(inspect(x))# also works w/o tracemem
+ }
+ }
> x <- array(rnorm(47), dim = c(1000, 50, 40))
> showAdr(x)
[1] "<0x7fad78b37010>"
> dim(x) <- c(1000*50, 40) # *no* copying
tracemem[0x7fad78b37010 -> 0x7fad77bf4010]: 
> showAdr(x) # Rstudio "fails" and has copied x
[1] "<0x7fad77bf4010>"
> x[3] <- pi
tracemem[0x7fad77bf4010 -> 0x1ad05f50]: 
> showAdr(x)
[1] "<0x1ad05f50>"
> ## in R, R CMD BATCH, also from ESS: there is *no* copying
> ## However, in Rstudio copying has happened!
>

Martin


> Denes

>> 
>> Indeed, R has become much smarter  in these things in recent
>> years ... not thanks to me, but very much thanks to
>> Luke Tierney (from R-core), and also thanks to contributions from 
"outside",
>> notably Tomas Kalibera.
>> 
>> And hence:*NO*  such strange workarounds are needed in this specific 
case:
>> 
>> > Workaround: use data.table::setattr or bit::setattr to modify the
>> > dimensions in place (i.e., without making a copy). Risk: if you modify
>> > an object by reference, all other objects which point to the same 
memory
>> > address will be modified silently, too.
>> 
>> Martin Maechler, ETH Zurich  (and R-core)
>> 
>> > HTH,
>> > Denes
>> 
>> (generally, your contributions help indeed, Denes, thank you!)
>> 
>> 
>> > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> >> Hi All:
>> >>
>> >> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time 
periods, the array is of the form:
>> >>
>> >> myData[noLat, no Lon, noTime].
>> >>
>> >> It is read in this way because that is how it is stored in a (series) 
of netcdf files.  For the analysis I need to do, I need instead the array:
>> >>
>> >> myData[noLat*noLon, noTime].  Normally this would be easy:
>> >>
>> >> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> >>
>> >> My question is how does this command work in R - does it make a copy 
of the existing array, with different indices for the dimensions, or does it 
just redo the indices and leave the given array as is?  The reason for this 
question is my array is 30GB in memory, and I don’t have enough space to have a 
copy of the array in memory. 

Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Roy Mendelssohn - NOAA Federal
Thanks all.  This is interesting, and for what I am doing worthwhile and 
helpful.  I have to be careful in each operation whether a copy is made or not, 
 and knowing this allows me to test on small examples what any command will do 
before I use,

Thanks again, I appreciate all the help.  I will have a related question, but 
will put it under a different heading.

-Roy
> On Mar 22, 2016, at 2:55 AM, Dénes Tóth  wrote:
> 
> 
> Hi Martin,
> 
> 
> On 03/22/2016 10:20 AM, Martin Maechler wrote:
>>> >Dénes Tóth
>>> > on Fri, 18 Mar 2016 22:56:23 +0100 writes:
>> > Hi Roy,
>> > R (usually) makes a copy if the dimensionality of an array is modified,
>> > even if you use this syntax:
>> 
>> > x <- array(1:24, c(2, 3, 4))
>> > dim(x) <- c(6, 4)
>> 
>> > See also ?tracemem, ?data.table::address, ?pryr::address and other 
>> tools
>> > to trace if an internal copy is done.
>> 
>> Well, without using strange (;-) packages,  indeed standard R's
>> tracemem(), notably the help page is a good pointer.
>> 
>> According to the help page memory tracing is enabled in the
>> default R binaries for Windows and OS X.
>> For Linux (where I, as R developer, compile R myself anyway),
>> one needs to configure with --enable-memory-profiling .
>> 
>> Now, let's try:
>> 
>>> x <- array(rnorm(47), dim = c(1000,50, 40))
>>> tracemem(x)
>>[1] "<0x7f79a498a010>"
>>> dim(x) <- c(1000* 50, 40)
>>> x[5] <- pi
>>> tracemem(x)
>>[1] "<0x7f79a498a010>"
>>>
>> 
>> So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
>> *NOT*  make a copy.
> 
> This is interesting. First I wanted to demonstrate to Roy that recent R 
> versions are smart enough not to make any copy during reshaping an array. 
> Then I put together an example (similar to yours) and realized that after 
> several reshapes, R starts to copy the array. So I had to modify my 
> suggestion... And now, I realized that this was an RStudio-issue. At least on 
> Linux, a standard R terminal behaves as you described, however, RStudio 
> (version 0.99.862, which is not the very latest) tends to create copies 
> (quite randomly, at least to me). If I have time I will test this more 
> thoroughly and file a report to RStudio if it turns out to be a bug.
> 
> Denes
> 
>> 
>> Indeed, R has become much smarter  in these things in recent
>> years ... not thanks to me, but very much thanks to
>> Luke Tierney (from R-core), and also thanks to contributions from "outside",
>> notably Tomas Kalibera.
>> 
>> And hence:*NO*  such strange workarounds are needed in this specific case:
>> 
>> > Workaround: use data.table::setattr or bit::setattr to modify the
>> > dimensions in place (i.e., without making a copy). Risk: if you modify
>> > an object by reference, all other objects which point to the same 
>> memory
>> > address will be modified silently, too.
>> 
>> Martin Maechler, ETH Zurich  (and R-core)
>> 
>> > HTH,
>> > Denes
>> 
>> (generally, your contributions help indeed, Denes, thank you!)
>> 
>> 
>> > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> >> Hi All:
>> >>
>> >> I am working with a very large array.  if noLat is the number of 
>> latitudes, noLon the number of longitudes and noTime the number of  time 
>> periods, the array is of the form:
>> >>
>> >> myData[noLat, no Lon, noTime].
>> >>
>> >> It is read in this way because that is how it is stored in a (series) 
>> of netcdf files.  For the analysis I need to do, I need instead the array:
>> >>
>> >> myData[noLat*noLon, noTime].  Normally this would be easy:
>> >>
>> >> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> >>
>> >> My question is how does this command work in R - does it make a copy 
>> of the existing array, with different indices for the dimensions, or does it 
>> just redo the indices and leave the given array as is?  The reason for this 
>> question is my array is 30GB in memory, and I don’t have enough space to 
>> have a copy of the array in memory.  If the latter I will have to figure out 
>> a work around to bring in only part of the data at a time and put it into 
>> the proper locations.
>> >>
>> >> Thanks,
>> >>
>> >> -Roy

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.


Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Dénes Tóth


Hi Martin,


On 03/22/2016 10:20 AM, Martin Maechler wrote:

>Dénes Tóth
> on Fri, 18 Mar 2016 22:56:23 +0100 writes:

 > Hi Roy,
 > R (usually) makes a copy if the dimensionality of an array is modified,
 > even if you use this syntax:

 > x <- array(1:24, c(2, 3, 4))
 > dim(x) <- c(6, 4)

 > See also ?tracemem, ?data.table::address, ?pryr::address and other tools
 > to trace if an internal copy is done.

Well, without using strange (;-) packages,  indeed standard R's
tracemem(), notably the help page is a good pointer.

According to the help page memory tracing is enabled in the
default R binaries for Windows and OS X.
For Linux (where I, as R developer, compile R myself anyway),
one needs to configure with --enable-memory-profiling .

Now, let's try:

> x <- array(rnorm(47), dim = c(1000,50, 40))
> tracemem(x)
[1] "<0x7f79a498a010>"
> dim(x) <- c(1000* 50, 40)
> x[5] <- pi
> tracemem(x)
[1] "<0x7f79a498a010>"
>

So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
*NOT*  make a copy.


This is interesting. First I wanted to demonstrate to Roy that recent R 
versions are smart enough not to make any copy during reshaping an 
array. Then I put together an example (similar to yours) and realized 
that after several reshapes, R starts to copy the array. So I had to 
modify my suggestion... And now, I realized that this was an 
RStudio-issue. At least on Linux, a standard R terminal behaves as you 
described, however, RStudio (version 0.99.862, which is not the very 
latest) tends to create copies (quite randomly, at least to me). If I 
have time I will test this more thoroughly and file a report to RStudio 
if it turns out to be a bug.


Denes



Indeed, R has become much smarter  in these things in recent
years ... not thanks to me, but very much thanks to
Luke Tierney (from R-core), and also thanks to contributions from "outside",
notably Tomas Kalibera.

And hence:*NO*  such strange workarounds are needed in this specific case:

 > Workaround: use data.table::setattr or bit::setattr to modify the
 > dimensions in place (i.e., without making a copy). Risk: if you modify
 > an object by reference, all other objects which point to the same memory
 > address will be modified silently, too.

Martin Maechler, ETH Zurich  (and R-core)

 > HTH,
 > Denes

(generally, your contributions help indeed, Denes, thank you!)


 > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
 >> Hi All:
 >>
 >> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:
 >>
 >> myData[noLat, no Lon, noTime].
 >>
 >> It is read in this way because that is how it is stored in a (series) 
of netcdf files.  For the analysis I need to do, I need instead the array:
 >>
 >> myData[noLat*noLon, noTime].  Normally this would be easy:
 >>
 >> myData<- array(myData,dim=c(noLat*noLon,noTime))
 >>
 >> My question is how does this command work in R - does it make a copy of 
the existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is my 
array is 30GB in memory, and I don’t have enough space to have a copy of the array in 
memory.  If the latter I will have to figure out a work around to bring in only part 
of the data at a time and put it into the proper locations.
 >>
 >> Thanks,
 >>
 >> -Roy



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Martin Maechler
> Dénes Tóth 
> on Fri, 18 Mar 2016 22:56:23 +0100 writes:

> Hi Roy,
> R (usually) makes a copy if the dimensionality of an array is modified, 
> even if you use this syntax:

> x <- array(1:24, c(2, 3, 4))
> dim(x) <- c(6, 4)

> See also ?tracemem, ?data.table::address, ?pryr::address and other tools 
> to trace if an internal copy is done.

Well, without using strange (;-) packages,  indeed standard R's
tracemem(), notably the help page is a good pointer.

According to the help page memory tracing is enabled in the
default R binaries for Windows and OS X.
For Linux (where I, as R developer, compile R myself anyway),
one needs to configure with --enable-memory-profiling .

Now, let's try:

   > x <- array(rnorm(47), dim = c(1000,50, 40))
   > tracemem(x)
   [1] "<0x7f79a498a010>"
   > dim(x) <- c(1000* 50, 40)
   > x[5] <- pi
   > tracemem(x)
   [1] "<0x7f79a498a010>"
   > 

So, *BOTH*  the re-dimensioning  *AND*  the  sub-assignment did
*NOT* make a copy.

Indeed, R has become much smarter  in these things in recent
years ... not thanks to me, but very much thanks to
Luke Tierney (from R-core), and also thanks to contributions from "outside",
notably Tomas Kalibera.

And hence: *NO* such strange workarounds are needed in this specific case: 

> Workaround: use data.table::setattr or bit::setattr to modify the 
> dimensions in place (i.e., without making a copy). Risk: if you modify 
> an object by reference, all other objects which point to the same memory 
> address will be modified silently, too.

Martin Maechler, ETH Zurich  (and R-core)

> HTH,
> Denes

(generally, your contributions help indeed, Denes, thank you!)


> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> Hi All:
>> 
>> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time 
periods, the array is of the form:
>> 
>> myData[noLat, no Lon, noTime].
>> 
>> It is read in this way because that is how it is stored in a (series) of 
netcdf files.  For the analysis I need to do, I need instead the array:
>> 
>> myData[noLat*noLon, noTime].  Normally this would be easy:
>> 
>> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> 
>> My question is how does this command work in R - does it make a copy of 
the existing array, with different indices for the dimensions, or does it just 
redo the indices and leave the given array as is?  The reason for this question 
is my array is 30GB in memory, and I don’t have enough space to have a copy of 
the array in memory.  If the latter I will have to figure out a work around to 
bring in only part of the data at a time and put it into the proper locations.
>> 
>> Thanks,
>> 
>> -Roy

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-21 Thread Roy Mendelssohn - NOAA Federal
Thanks for the info, but I will stay with regular R.  Work -arounds for what I 
want to do just took some thought and programming, I just didn’t know if R 
copied the array  or just manipulated indices, and given the size of the array 
I am memory limited.  

This gets into the old thing of whether it is better to pass arrays by 
reference or value.  As an old Fortran programmer, that is one nice thing that 
could be done in Fortran - since arrays are passed by reference, the indices 
could be manipulated in the same memory space by passing to a subroutine, or 
through an equivalence statement.  But all of these things have trade-offs.

-Roy






> On Mar 21, 2016, at 7:26 AM, Rodrigo Botafogo  
> wrote:
> 
> Roy,
> 
> I have implemented a Ruby Gem (SciCom) with exactly your use case in mind.
> SciCom is based on Renjin, an R interpreter for the JVM.  So, this reply is
> about R, but not about GnuR.  If this is not proper behavior, please let me
> know.  I´ve looked at the posting guidelines and it seems to be ok.
> 
> SciCom interfaces with another Ruby Gem: MDArray.  MDArray is a
> multidimensional array class that is based on NetCDF-Java.  It can read
> netcdf files and store them in a multidimensional array.  MDArray can be
> reshaped and also sliced and diced as you can do with NetCDF.  Reshaping an
> MDArray does not require any copying, it is just index manipulations.
> 
> An MDArray can then be "send" to SciCom.  This is not really sending, since
> there is no data copying either and the array is just wrapped in such a way
> that it can be used in Renjin.  In Renjin you could use normal R functions
> to process this data and do your analysis.
> 
> The data in SciCom can thus be viewed either as an R array, to which R
> sematics apply and reshaping will copy the data, or as an MDArray, and
> reshaping and slicing/dicing does not do any copying.  It is up to the
> developer to be careful how to see the data and operate on it.
> 
> There are however two showstoppers: i) Renjin does not load all CRAN
> packages yet.  So, there is a good chance that if you need a package for
> your PCA analysis, that this will not be loaded; ii) SciCom/Renjin do not
> support any graphics such as plot/ggplot.
> 
> References:
> 
> Renjin: http://www.renjin.org/
> NetCDF-Java:
> http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/
> MDArray: https://github.com/rbotafogo/mdarray/wiki
> SciCom:
>https://github.com/rbotafogo/scicom
> 
> https://github.com/rbotafogo/scicom/wiki/A-(not-so)-Short-Introduction-to-SciCom
> 
> -- 
> Rodrigo Botafogo
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reshaping an array - how does it work in R

2016-03-21 Thread Rodrigo Botafogo
Roy,

I have implemented a Ruby Gem (SciCom) with exactly your use case in mind.
SciCom is based on Renjin, an R interpreter for the JVM.  So, this reply is
about R, but not about GnuR.  If this is not proper behavior, please let me
know.  I´ve looked at the posting guidelines and it seems to be ok.

SciCom interfaces with another Ruby Gem: MDArray.  MDArray is a
multidimensional array class that is based on NetCDF-Java.  It can read
netcdf files and store them in a multidimensional array.  MDArray can be
reshaped and also sliced and diced as you can do with NetCDF.  Reshaping an
MDArray does not require any copying, it is just index manipulations.

An MDArray can then be "send" to SciCom.  This is not really sending, since
there is no data copying either and the array is just wrapped in such a way
that it can be used in Renjin.  In Renjin you could use normal R functions
to process this data and do your analysis.

The data in SciCom can thus be viewed either as an R array, to which R
sematics apply and reshaping will copy the data, or as an MDArray, and
reshaping and slicing/dicing does not do any copying.  It is up to the
developer to be careful how to see the data and operate on it.

There are however two showstoppers: i) Renjin does not load all CRAN
packages yet.  So, there is a good chance that if you need a package for
your PCA analysis, that this will not be loaded; ii) SciCom/Renjin do not
support any graphics such as plot/ggplot.

References:

Renjin: http://www.renjin.org/
NetCDF-Java:
http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/
MDArray: https://github.com/rbotafogo/mdarray/wiki
SciCom:
https://github.com/rbotafogo/scicom

https://github.com/rbotafogo/scicom/wiki/A-(not-so)-Short-Introduction-to-SciCom

-- 
Rodrigo Botafogo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-19 Thread Dénes Tóth

Hi Roy,

R (usually) makes a copy if the dimensionality of an array is modified, 
even if you use this syntax:

x <- array(1:24, c(2, 3, 4))
dim(x) <- c(6, 4)

See also ?tracemem, ?data.table::address, ?pryr::address and other tools 
to trace if an internal copy is done.


Workaround: use data.table::setattr or bit::setattr to modify the 
dimensions in place (i.e., without making a copy). Risk: if you modify 
an object by reference, all other objects which point to the same memory 
address will be modified silently, too.


HTH,
  Denes



On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:

Hi All:

I am working with a very large array.  if noLat is the number of latitudes, 
noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:

myData[noLat, no Lon, noTime].

It is read in this way because that is how it is stored in a (series) of netcdf 
files.  For the analysis I need to do, I need instead the array:

myData[noLat*noLon, noTime].  Normally this would be easy:

myData<- array(myData,dim=c(noLat*noLon,noTime))

My question is how does this command work in R - does it make a copy of the 
existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is 
my array is 30GB in memory, and I don’t have enough space to have a copy of the 
array in memory.  If the latter I will have to figure out a work around to 
bring in only part of the data at a time and put it into the proper locations.

Thanks,

-Roy



**
"The contents of this message do not reflect any position of the U.S. Government or 
NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-19 Thread Roy Mendelssohn - NOAA Federal

> On Mar 19, 2016, at 8:18 AM, Henrik Bengtsson  
> wrote:
> 
> On Fri, Mar 18, 2016 at 8:28 PM, Roy Mendelssohn - NOAA Federal
>  wrote:
>> Hi Henrik:
>> 
>> I want to do want in oceanography is called an EOF, which is just a PCA 
>> analysis. Unless I am missing something, in R I need to flatten my 3-D 
>> matrix into a 2-D data matrix. I can fit the entire 30GB matrix into memory, 
>> and I believe I have enough memory to do the PCA by constraining the number 
>> of components returned .  What I don’t think I have enough memory for is an 
>> operation that makes a copy of the matrix.
>> 
>> As I said, in theory I know how to do the flattening, it a simple command, 
>> but in practice I don’t have enough memory.  So I spent the afternoon 
>> rewriting my code to read in parts of the data at a time and then putting 
>> those in the appropriate places of a matrix already flattened of appropriate 
>> size.  In case someone is wondering, on the 3D grid the matrix is 
>> [1001,1001,3650].  So I create an empty matrix size [1001*1001,3650], and 
>> read in a slice of the lat-lon grid, and map those into the appropriate 
>> places in the flattened matrix.  By reading in appropriately sized chunks  
>> my memory usage is not pushed too far.
> 
> Sounds good.  There's another small caveat. Make sure to specify the
> 'data' argument for matrix() we allocating an "empty" matrix, e.g.
> 
>X <- matrix(NA_real_, nrow=1001*1001, ncol=3650)
> 
> This will give you a double matrix with all missing value.  If you use
> the default
> 
>X <- matrix(nrow=1001*1001, ncol=3650)
> 
> you'll get a logical matrix, which will introduce a copy as soon as
> you assign a double value (e.g. X[1,1] <- 3.14). The latter is a
> complete waste of memory/time. See
> http://www.jottr.org/2014/06/matrixNA-wrong-way.html for details.
> 
> /Henrik

Thanks.  Yes one time for some reason I can’t remember I did ?NA  where that is 
documented but it is not something you would think of offhand.

-Roy


**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-19 Thread Henrik Bengtsson
On Fri, Mar 18, 2016 at 8:28 PM, Roy Mendelssohn - NOAA Federal
 wrote:
> Hi Henrik:
>
> I want to do want in oceanography is called an EOF, which is just a PCA 
> analysis. Unless I am missing something, in R I need to flatten my 3-D matrix 
> into a 2-D data matrix. I can fit the entire 30GB matrix into memory, and I 
> believe I have enough memory to do the PCA by constraining the number of 
> components returned .  What I don’t think I have enough memory for is an 
> operation that makes a copy of the matrix.
>
> As I said, in theory I know how to do the flattening, it a simple command, 
> but in practice I don’t have enough memory.  So I spent the afternoon 
> rewriting my code to read in parts of the data at a time and then putting 
> those in the appropriate places of a matrix already flattened of appropriate 
> size.  In case someone is wondering, on the 3D grid the matrix is 
> [1001,1001,3650].  So I create an empty matrix size [1001*1001,3650], and 
> read in a slice of the lat-lon grid, and map those into the appropriate 
> places in the flattened matrix.  By reading in appropriately sized chunks  my 
> memory usage is not pushed too far.

Sounds good.  There's another small caveat. Make sure to specify the
'data' argument for matrix() we allocating an "empty" matrix, e.g.

X <- matrix(NA_real_, nrow=1001*1001, ncol=3650)

This will give you a double matrix with all missing value.  If you use
the default

X <- matrix(nrow=1001*1001, ncol=3650)

you'll get a logical matrix, which will introduce a copy as soon as
you assign a double value (e.g. X[1,1] <- 3.14). The latter is a
complete waste of memory/time. See
http://www.jottr.org/2014/06/matrixNA-wrong-way.html for details.

/Henrik

>
> -Roy
>
>
>> On Mar 18, 2016, at 7:37 PM, Henrik Bengtsson  
>> wrote:
>>
>> On Fri, Mar 18, 2016 at 3:15 PM, Roy Mendelssohn - NOAA Federal
>>  wrote:
>>> Thanks.  That is what I needed to know.  I don’t want to play around with 
>>> some of the other suggestions, as I don’t totally understand what they do, 
>>> and don’t want to risk messing up something and not be aware of it.
>>>
>>> There is a way to read in the data chunks at a time and reshape it and put, 
>>> it into the (reshaped) larger array, harder to program but probably worth 
>>> the pain to be certain of what I am doing.
>>
>> I recommend this approach; whenever I work with reasonable large data
>> (that may become even larger in the future), I try to implement a
>> constant-memory version of the algorithm, which often comes down to
>> processing data in chunks.  The simplest version of this is to read
>> all data into memory and the subset, but if you can read data in in
>> chunks that is even better.
>>
>> Though, I'm curious to what matrix operations you wish to perform.
>> Because if you wish to do regular summation, then base::.rowSums() and
>> base::.colSums() allow you to override the default dimensions on the
>> fly without inducing extra copies, e.g.
>>
>>> X <- array(1:24, dim=c(2,3,4))
>>> .rowSums(X, m=6, n=4)
>> [1] 40 44 48 52 56 60
>>> rowSums(matrix(X, nrow=6, ncol=4))
>> [1] 40 44 48 52 56 60
>>
>> For other types of calculations, you might want to look at
>> matrixStats.  It has partial(*) support for overriding the default
>> dimensions in a similar fashion.  For instance,
>>
>>> library("matrixStats")
>>> rowVars(X, dim.=c(6,4))
>>
>> The above effectively calculates rowVars(matrix(X, nrow=6, ncol=4))
>> without making copies.
>>
>> (*) By partial I mean that this is a feature that hasn't been pushed
>> through to all of matrixStats functions, cf.
>> https://github.com/HenrikBengtsson/matrixStats/issues/83.
>>
>> Cheers,
>>
>> Henrik
>> (author of matrixStats)
>>
>>>
>>> I had a feeling a copy was made, just wanted to make certain of it.
>>>
>>> Thanks again,
>>>
>>> -Roy
>>>
 On Mar 18, 2016, at 2:56 PM, Dénes Tóth  wrote:

 Hi Roy,

 R (usually) makes a copy if the dimensionality of an array is modified, 
 even if you use this syntax:
 x <- array(1:24, c(2, 3, 4))
 dim(x) <- c(6, 4)

 See also ?tracemem, ?data.table::address, ?pryr::address and other tools 
 to trace if an internal copy is done.

 Workaround: use data.table::setattr or bit::setattr to modify the 
 dimensions in place (i.e., without making a copy). Risk: if you modify an 
 object by reference, all other objects which point to the same memory 
 address will be modified silently, too.

 HTH,
 Denes



 On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
> Hi All:
>
> I am working with a very large array.  if noLat is the number of 
> latitudes, noLon the number of longitudes and noTime the number of  time 
> periods, the array is of the form:
>
> myData[noLat, no Lon, noTime].
>
> It is read in this way because that 

[R] Reshaping an array - how does it work in R

2016-03-19 Thread Roy Mendelssohn - NOAA Federal
Hi All:

I am working with a very large array.  if noLat is the number of latitudes, 
noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:

myData[noLat, no Lon, noTime].

It is read in this way because that is how it is stored in a (series) of netcdf 
files.  For the analysis I need to do, I need instead the array:

myData[noLat*noLon, noTime].  Normally this would be easy:

myData<- array(myData,dim=c(noLat*noLon,noTime))

My question is how does this command work in R - does it make a copy of the 
existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is 
my array is 30GB in memory, and I don’t have enough space to have a copy of the 
array in memory.  If the latter I will have to figure out a work around to 
bring in only part of the data at a time and put it into the proper locations.

Thanks,

-Roy



**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Bert Gunter
arrays are vectors stored in column major order.  So the answer is: reindexing.

Does this make it clear:

> v <- array(1:24,dim=2:4)
> as.vector(v)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

> v
, , 1

 [,1] [,2] [,3]
[1,]135
[2,]246

, , 2

 [,1] [,2] [,3]
[1,]79   11
[2,]8   10   12

, , 3

 [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

 [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

> w <- array(as.vector(v),dim=c(6,4)) ## you would use v instead of w for the 
> assignment
> w
 [,1] [,2] [,3] [,4]
[1,]17   13   19
[2,]28   14   20
[3,]39   15   21
[4,]4   10   16   22
[5,]5   11   17   23
[6,]6   12   18   24
> identical(as.vector(w), as.vector(v))
[1] TRUE


However copying may occur anyway as part of R's semantics. Others will
have to help you on that, as the details here are beyond me.

Cheers,
Bert



Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 18, 2016 at 2:28 PM, Roy Mendelssohn - NOAA Federal
 wrote:
> Hi All:
>
> I am working with a very large array.  if noLat is the number of latitudes, 
> noLon the number of longitudes and noTime the number of  time periods, the 
> array is of the form:
>
> myData[noLat, no Lon, noTime].
>
> It is read in this way because that is how it is stored in a (series) of 
> netcdf files.  For the analysis I need to do, I need instead the array:
>
> myData[noLat*noLon, noTime].  Normally this would be easy:
>
> myData<- array(myData,dim=c(noLat*noLon,noTime))
>
> My question is how does this command work in R - does it make a copy of the 
> existing array, with different indices for the dimensions, or does it just 
> redo the indices and leave the given array as is?  The reason for this 
> question is my array is 30GB in memory, and I don’t have enough space to have 
> a copy of the array in memory.  If the latter I will have to figure out a 
> work around to bring in only part of the data at a time and put it into the 
> proper locations.
>
> Thanks,
>
> -Roy
>
>
>
> **
> "The contents of this message do not reflect any position of the U.S. 
> Government or NOAA."
> **
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> ***Note new address and phone***
> 110 Shaffer Road
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected"
> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Roy Mendelssohn - NOAA Federal
Hi Henrik:

I want to do want in oceanography is called an EOF, which is just a PCA 
analysis. Unless I am missing something, in R I need to flatten my 3-D matrix 
into a 2-D data matrix. I can fit the entire 30GB matrix into memory, and I 
believe I have enough memory to do the PCA by constraining the number of 
components returned .  What I don’t think I have enough memory for is an 
operation that makes a copy of the matrix.

As I said, in theory I know how to do the flattening, it a simple command, but 
in practice I don’t have enough memory.  So I spent the afternoon rewriting my 
code to read in parts of the data at a time and then putting those in the 
appropriate places of a matrix already flattened of appropriate size.  In case 
someone is wondering, on the 3D grid the matrix is [1001,1001,3650].  So I 
create an empty matrix size [1001*1001,3650], and read in a slice of the 
lat-lon grid, and map those into the appropriate places in the flattened 
matrix.  By reading in appropriately sized chunks  my memory usage is not 
pushed too far.

-Roy


> On Mar 18, 2016, at 7:37 PM, Henrik Bengtsson  
> wrote:
> 
> On Fri, Mar 18, 2016 at 3:15 PM, Roy Mendelssohn - NOAA Federal
>  wrote:
>> Thanks.  That is what I needed to know.  I don’t want to play around with 
>> some of the other suggestions, as I don’t totally understand what they do, 
>> and don’t want to risk messing up something and not be aware of it.
>> 
>> There is a way to read in the data chunks at a time and reshape it and put, 
>> it into the (reshaped) larger array, harder to program but probably worth 
>> the pain to be certain of what I am doing.
> 
> I recommend this approach; whenever I work with reasonable large data
> (that may become even larger in the future), I try to implement a
> constant-memory version of the algorithm, which often comes down to
> processing data in chunks.  The simplest version of this is to read
> all data into memory and the subset, but if you can read data in in
> chunks that is even better.
> 
> Though, I'm curious to what matrix operations you wish to perform.
> Because if you wish to do regular summation, then base::.rowSums() and
> base::.colSums() allow you to override the default dimensions on the
> fly without inducing extra copies, e.g.
> 
>> X <- array(1:24, dim=c(2,3,4))
>> .rowSums(X, m=6, n=4)
> [1] 40 44 48 52 56 60
>> rowSums(matrix(X, nrow=6, ncol=4))
> [1] 40 44 48 52 56 60
> 
> For other types of calculations, you might want to look at
> matrixStats.  It has partial(*) support for overriding the default
> dimensions in a similar fashion.  For instance,
> 
>> library("matrixStats")
>> rowVars(X, dim.=c(6,4))
> 
> The above effectively calculates rowVars(matrix(X, nrow=6, ncol=4))
> without making copies.
> 
> (*) By partial I mean that this is a feature that hasn't been pushed
> through to all of matrixStats functions, cf.
> https://github.com/HenrikBengtsson/matrixStats/issues/83.
> 
> Cheers,
> 
> Henrik
> (author of matrixStats)
> 
>> 
>> I had a feeling a copy was made, just wanted to make certain of it.
>> 
>> Thanks again,
>> 
>> -Roy
>> 
>>> On Mar 18, 2016, at 2:56 PM, Dénes Tóth  wrote:
>>> 
>>> Hi Roy,
>>> 
>>> R (usually) makes a copy if the dimensionality of an array is modified, 
>>> even if you use this syntax:
>>> x <- array(1:24, c(2, 3, 4))
>>> dim(x) <- c(6, 4)
>>> 
>>> See also ?tracemem, ?data.table::address, ?pryr::address and other tools to 
>>> trace if an internal copy is done.
>>> 
>>> Workaround: use data.table::setattr or bit::setattr to modify the 
>>> dimensions in place (i.e., without making a copy). Risk: if you modify an 
>>> object by reference, all other objects which point to the same memory 
>>> address will be modified silently, too.
>>> 
>>> HTH,
>>> Denes
>>> 
>>> 
>>> 
>>> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
 Hi All:
 
 I am working with a very large array.  if noLat is the number of 
 latitudes, noLon the number of longitudes and noTime the number of  time 
 periods, the array is of the form:
 
 myData[noLat, no Lon, noTime].
 
 It is read in this way because that is how it is stored in a (series) of 
 netcdf files.  For the analysis I need to do, I need instead the array:
 
 myData[noLat*noLon, noTime].  Normally this would be easy:
 
 myData<- array(myData,dim=c(noLat*noLon,noTime))
 
 My question is how does this command work in R - does it make a copy of 
 the existing array, with different indices for the dimensions, or does it 
 just redo the indices and leave the given array as is?  The reason for 
 this question is my array is 30GB in memory, and I don’t have enough space 
 to have a copy of the array in memory.  If the latter I will have to 
 figure out a work around to bring in only part of the data at a time and 
 put it into the proper locations.

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Roy Mendelssohn - NOAA Federal

> On Mar 18, 2016, at 2:56 PM, Bert Gunter  wrote:
> 
> However copying may occur anyway as part of R's semantics. Others will
> have to help you on that, as the details here are beyond me.
> 
> Cheers,
> Bert

Hi Bert:

Thanks for your response.  The only part I was concerned with is whether a copy 
was made, that is what my memory usage would be.  Sorry if that wasn’t clear in 
the original.

-Roy


**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Roy Mendelssohn - NOAA Federal
Thanks.  That is what I needed to know.  I don’t want to play around with some 
of the other suggestions, as I don’t totally understand what they do, and don’t 
want to risk messing up something and not be aware of it.

There is a way to read in the data chunks at a time and reshape it and put, it 
into the (reshaped) larger array, harder to program but probably worth the pain 
to be certain of what I am doing.

I had a feeling a copy was made, just wanted to make certain of it.

Thanks again,

-Roy

> On Mar 18, 2016, at 2:56 PM, Dénes Tóth  wrote:
> 
> Hi Roy,
> 
> R (usually) makes a copy if the dimensionality of an array is modified, even 
> if you use this syntax:
> x <- array(1:24, c(2, 3, 4))
> dim(x) <- c(6, 4)
> 
> See also ?tracemem, ?data.table::address, ?pryr::address and other tools to 
> trace if an internal copy is done.
> 
> Workaround: use data.table::setattr or bit::setattr to modify the dimensions 
> in place (i.e., without making a copy). Risk: if you modify an object by 
> reference, all other objects which point to the same memory address will be 
> modified silently, too.
> 
> HTH,
>  Denes
> 
> 
> 
> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> Hi All:
>> 
>> I am working with a very large array.  if noLat is the number of latitudes, 
>> noLon the number of longitudes and noTime the number of  time periods, the 
>> array is of the form:
>> 
>> myData[noLat, no Lon, noTime].
>> 
>> It is read in this way because that is how it is stored in a (series) of 
>> netcdf files.  For the analysis I need to do, I need instead the array:
>> 
>> myData[noLat*noLon, noTime].  Normally this would be easy:
>> 
>> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> 
>> My question is how does this command work in R - does it make a copy of the 
>> existing array, with different indices for the dimensions, or does it just 
>> redo the indices and leave the given array as is?  The reason for this 
>> question is my array is 30GB in memory, and I don’t have enough space to 
>> have a copy of the array in memory.  If the latter I will have to figure out 
>> a work around to bring in only part of the data at a time and put it into 
>> the proper locations.
>> 
>> Thanks,
>> 
>> -Roy
>> 
>> 
>> 
>> **
>> "The contents of this message do not reflect any position of the U.S. 
>> Government or NOAA."
>> **
>> Roy Mendelssohn
>> Supervisory Operations Research Analyst
>> NOAA/NMFS
>> Environmental Research Division
>> Southwest Fisheries Science Center
>> ***Note new address and phone***
>> 110 Shaffer Road
>> Santa Cruz, CA 95060
>> Phone: (831)-420-3666
>> Fax: (831) 420-3980
>> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>> 
>> "Old age and treachery will overcome youth and skill."
>> "From those who have been given much, much will be expected"
>> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Henrik Bengtsson
On Fri, Mar 18, 2016 at 3:15 PM, Roy Mendelssohn - NOAA Federal
 wrote:
> Thanks.  That is what I needed to know.  I don’t want to play around with 
> some of the other suggestions, as I don’t totally understand what they do, 
> and don’t want to risk messing up something and not be aware of it.
>
> There is a way to read in the data chunks at a time and reshape it and put, 
> it into the (reshaped) larger array, harder to program but probably worth the 
> pain to be certain of what I am doing.

I recommend this approach; whenever I work with reasonable large data
(that may become even larger in the future), I try to implement a
constant-memory version of the algorithm, which often comes down to
processing data in chunks.  The simplest version of this is to read
all data into memory and the subset, but if you can read data in in
chunks that is even better.

Though, I'm curious to what matrix operations you wish to perform.
Because if you wish to do regular summation, then base::.rowSums() and
base::.colSums() allow you to override the default dimensions on the
fly without inducing extra copies, e.g.

> X <- array(1:24, dim=c(2,3,4))
> .rowSums(X, m=6, n=4)
[1] 40 44 48 52 56 60
> rowSums(matrix(X, nrow=6, ncol=4))
[1] 40 44 48 52 56 60

For other types of calculations, you might want to look at
matrixStats.  It has partial(*) support for overriding the default
dimensions in a similar fashion.  For instance,

> library("matrixStats")
> rowVars(X, dim.=c(6,4))

The above effectively calculates rowVars(matrix(X, nrow=6, ncol=4))
without making copies.

(*) By partial I mean that this is a feature that hasn't been pushed
through to all of matrixStats functions, cf.
https://github.com/HenrikBengtsson/matrixStats/issues/83.

Cheers,

Henrik
(author of matrixStats)

>
> I had a feeling a copy was made, just wanted to make certain of it.
>
> Thanks again,
>
> -Roy
>
>> On Mar 18, 2016, at 2:56 PM, Dénes Tóth  wrote:
>>
>> Hi Roy,
>>
>> R (usually) makes a copy if the dimensionality of an array is modified, even 
>> if you use this syntax:
>> x <- array(1:24, c(2, 3, 4))
>> dim(x) <- c(6, 4)
>>
>> See also ?tracemem, ?data.table::address, ?pryr::address and other tools to 
>> trace if an internal copy is done.
>>
>> Workaround: use data.table::setattr or bit::setattr to modify the dimensions 
>> in place (i.e., without making a copy). Risk: if you modify an object by 
>> reference, all other objects which point to the same memory address will be 
>> modified silently, too.
>>
>> HTH,
>>  Denes
>>
>>
>>
>> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>>> Hi All:
>>>
>>> I am working with a very large array.  if noLat is the number of latitudes, 
>>> noLon the number of longitudes and noTime the number of  time periods, the 
>>> array is of the form:
>>>
>>> myData[noLat, no Lon, noTime].
>>>
>>> It is read in this way because that is how it is stored in a (series) of 
>>> netcdf files.  For the analysis I need to do, I need instead the array:
>>>
>>> myData[noLat*noLon, noTime].  Normally this would be easy:
>>>
>>> myData<- array(myData,dim=c(noLat*noLon,noTime))
>>>
>>> My question is how does this command work in R - does it make a copy of the 
>>> existing array, with different indices for the dimensions, or does it just 
>>> redo the indices and leave the given array as is?  The reason for this 
>>> question is my array is 30GB in memory, and I don’t have enough space to 
>>> have a copy of the array in memory.  If the latter I will have to figure 
>>> out a work around to bring in only part of the data at a time and put it 
>>> into the proper locations.
>>>
>>> Thanks,
>>>
>>> -Roy
>>>
>>>
>>>
>>> **
>>> "The contents of this message do not reflect any position of the U.S. 
>>> Government or NOAA."
>>> **
>>> Roy Mendelssohn
>>> Supervisory Operations Research Analyst
>>> NOAA/NMFS
>>> Environmental Research Division
>>> Southwest Fisheries Science Center
>>> ***Note new address and phone***
>>> 110 Shaffer Road
>>> Santa Cruz, CA 95060
>>> Phone: (831)-420-3666
>>> Fax: (831) 420-3980
>>> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>>>
>>> "Old age and treachery will overcome youth and skill."
>>> "From those who have been given much, much will be expected"
>>> "the arc of the moral universe is long, but it bends toward justice" -MLK 
>>> Jr.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> **
> "The contents of this message do not reflect any position of the U.S. 
> Government or NOAA."
> **
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> 

Re: [R] Reshaping an array - how does it work in R

2016-03-18 Thread Jeff Newmiller
R always makes a copy for this kind of operation. There are some operations 
that don't make copies, but I don't think this one qualifies. 

-- 
Sent from my phone. Please excuse my brevity.

On March 18, 2016 2:28:35 PM PDT, Roy Mendelssohn - NOAA Federal 
 wrote:
>Hi All:
>
>I am working with a very large array.  if noLat is the number of
>latitudes, noLon the number of longitudes and noTime the number of 
>time periods, the array is of the form:
>
>myData[noLat, no Lon, noTime].
>
>It is read in this way because that is how it is stored in a (series)
>of netcdf files.  For the analysis I need to do, I need instead the
>array:
>
>myData[noLat*noLon, noTime].  Normally this would be easy:
>
>myData<- array(myData,dim=c(noLat*noLon,noTime))
>
>My question is how does this command work in R - does it make a copy of
>the existing array, with different indices for the dimensions, or does
>it just redo the indices and leave the given array as is?  The reason
>for this question is my array is 30GB in memory, and I don’t have
>enough space to have a copy of the array in memory.  If the latter I
>will have to figure out a work around to bring in only part of the data
>at a time and put it into the proper locations.
>
>Thanks,
>
>-Roy
>
>
>
>**
>"The contents of this message do not reflect any position of the U.S.
>Government or NOAA."
>**
>Roy Mendelssohn
>Supervisory Operations Research Analyst
>NOAA/NMFS
>Environmental Research Division
>Southwest Fisheries Science Center
>***Note new address and phone***
>110 Shaffer Road
>Santa Cruz, CA 95060
>Phone: (831)-420-3666
>Fax: (831) 420-3980
>e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>
>"Old age and treachery will overcome youth and skill."
>"From those who have been given much, much will be expected" 
>"the arc of the moral universe is long, but it bends toward justice"
>-MLK Jr.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.