Re: [R] Error because of large dimension

2016-01-24 Thread li li
Thanks all for the reply. I think I need to think of other ways to approach
the problem.
Hanna

2016-01-24 17:45 GMT-05:00 William Dunlap :

> > 28 PiB.  Storing such a large matrix even on file is not possible.
>
> The ads for Amazon Red Shift say it is possible.  E.g.,
>   Amazon Redshift is a fast, fully managed, petabyte-scale data
>   warehouse that makes it simple and cost-effective to analyze
>   all your data using your existing business intelligence tools.
>   Start small for $0.25 per hour with no commitments and scale
>   to petabytes for $1,000 per terabyte per year, less than a tenth
>   the cost of traditional solutions. Customers typically see 3x
>   compression, reducing their costs to $333 per uncompressed
>   terabyte per year.
>
> Cost may be an issue:
>
> 28 petabytes * 1024 petabytes/terabyte * $333 terabyte/year ~= $9.5
> million/year
> or $26 thousand/day.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sun, Jan 24, 2016 at 1:29 PM, Henrik Bengtsson <
> henrik.bengts...@gmail.com> wrote:
>
>> FYI, the matrix you tried to allocate would hold
>> (3195*1290*495*35*35*35*15) * 3 = 3.936248e+15 values.  Each value
>> would occupy 8 bytes of memory (for the double data type).  In other
>> words, in order to keep this data matrix in memory you would require a
>> computer with at least 3.148998e+16 bytes of RAM, i.e. 29327331 GiB =
>> 28640 TiB = 28 PiB.  Storing such a large matrix even on file is not
>> possible.
>>
>> In other words, you need to figure out how to approach your original
>> problem in a different way.
>>
>> /Henrik
>>
>> On Sun, Jan 24, 2016 at 8:46 AM, li li  wrote:
>> > Hi all,
>> >   I am doing some calculation with very large dimension. I need to
>> create a
>> > matrix
>> > with three columns and a very large number of rows
>> > (3195*1290*495*35*35*35*15=1.312083e+15) i
>> > n order to allocate calculation result from a for loop.
>> > R does not allow me to create such a matrix because of the large
>> dimension
>> > (see below). Is there a way to go around this?
>> >   Thanks very much!!
>> >  Hanna
>> >
>> >
>> >> matrix(0, 3195*1290*495*35*35*35*15, 3)
>> > Error in matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>> >   invalid 'nrow' value (too large or NA)
>> > In addition: Warning message:
>> > In matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>> >   NAs introduced by coercion
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Block Triangular Matirx

2016-01-24 Thread Amina Shahzadi Shahzadi
Hi


I want to create a block upper triangular square matrix.


Anybody who can help in this regard.


Thank You

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] test z de tres proporciones

2016-01-24 Thread José Trujillo

El 23/01/16 a las 16:14, Fernando Sanchez escribió:

Hola a todos,

Quería hacer un test z con tres proporciones. He buscado información acerca de 
si es posible en R y no he encontrado nada al respecto.

La información de la que dispongo es la siguiente:

A nº total de casos: 56 nº eventos: 14

B nº total de casos: 49 nº eventos: 10

C nº total de casos: 51 nº ecentos: 17

La cuestión es que un mismo individuo puede tener más de un evento y es por eso 
que he pensado en el test z. Y no en un Ji-cuadrado. ¿Mi suposición es correcta?


El test Z para proporciones convencional utiliza una de estas dos 
distribuciones normales:


f ~ N(p0 , s) ---> z = (p-p0) / s Donde s = raiz( p0 (1-p0)/n)

f1 ~ N(p1, s1) y f2 ~ N(p2, s2) luego f1-f2 ~ N(p1 - p2, raiz(s1^2 + 
s2^2)) ---> Si p1-p2=0 Entonce  z= (f1 - f2)/raiz(s1^2 + s2^2)


No consigo imaginar como se puede construir un test Z con tres proporciones.

Sea el test Z o el test Ji-Cuadrado el test se construyen asumiendo que 
dispones de N = n1+n2+...+nk eventos que bajo hipótesis nula son 
resultados de experimentos binomiales. En cada muestra tienes N intentos 
(individuos) de encontrar un determinado resultado (una respuesta 
concreta, por ejemplo A) y se necesita que si es cierta la hipótesis 
nula del test todos los individuos (intentos) tengan la misma 
probabilidad de de contestar A y además la probabilidad de que cada 
individuos conteste A no depende de las respuestas de los demás individuos.


Dices que el mismo individuo puede producir más de un evento la puedo 
interpretar de dos formas.


1º Al mismo individuo se le pregunta dos veces: te has cargado todos los 
tests que conozco. Las dos respuestas del mismo individuo no cumplirían 
la condición de independencia de las sucesivas respuestas: pueden ser 
más probablemente parecidas o distintas entre sí que entre dos individuos.


2º Estás planteando tres cuestionarios completamente distintos y algunos 
individuos han contestado en más de un cuestionario. Bueno, si las 
cuestiones no están relacionadas (cuestionarios independientes) y no se 
cumple el punto anterior no veo el problema. Cada cuestionario seguiría 
siendo un experimento binomial y para un total de tres cuestionarios en 
los que quieres ver si la probabilidad de tres respuestas concretas es 
la misma en los tres cuestionarios.


En la información que das no aclaras si se trata de tres cuestionarios 
en los que en cada uno preguntas solo sobre si o no. Es decir en el 
primero pueden contestar A o no A, en el segundo grupos de casos pueden 
contestar B o no B, etc... Tendrías condiciones adecuadas para un test 
Ji-Cuadrado aunque algunos individuos participaran de los tres 
cuestionarios.


O por el contrario a n individuos se les ha permitido libremente que 
contesten sobre A, B y C pero algunos han contestado más de una opción y 
por eso el total de respuestas superaría a n. En este caso yo que no soy 
expertos en datos categóricos no conozco el test adecuado que recoja la 
falta de independencia en las respuestas. En todo caso la información 
disponible es insuficiente porque el análisis va a exigir conocer el 
grado de dependencia de las respuestas (saber cuántas veces aparecen 
juntas A y B, A y C, etc...)


Saludos.



saludos,

Fernando

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Block Triangular Matirx

2016-01-24 Thread Olivier Crouzet
Hi, I think this page will help,

https://stat.ethz.ch/R-manual/R-devel/library/base/html/lower.tri.html

Olivier.

--
Olivier Crouzet
LLING - Laboratoire de Linguistique de Nantes - EA3827
Université de Nantes

-Original Message-
From: Amina Shahzadi Shahzadi 
Sender: "R-help" Date: Mon, 25 Jan 2016 00:39:22 
To: r-help@R-project.org
Subject: [R] Block Triangular Matirx

Hi


I want to create a block upper triangular square matrix.


Anybody who can help in this regard.


Thank You

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating MA parameters through arima or through package "dlm"

2016-01-24 Thread Paul Gilbert

(Sorry for the delay in responding to this.)

On 01/05/2016 06:00 AM, r-help-requ...@r-project.org wrote:

Date: Mon, 4 Jan 2016 11:31:22 -0500
From: Mark Leeds
To: Stefano Sofia
Cc:"r-help@r-project.org"  
Subject: Re: [R] Estimating MA parameters through arima or through
package "dlm"
Message-ID:

Content-Type: text/plain; charset="UTF-8"

Hi: I don't have time to look at the details of what you're doing but the
"equivalence"
between state space and arima ( as paul gilbert pointed out a few weeks ago
) is not a true equivalence.


To state this a bit more precisely, there is a true equivalence between 
the input-output  equivalence classes of (linear, time invariant) 
state-space models with state dimension n and the input-output 
equivalence classes of (linear, time invariant) ARMA models with 
McMillan degree n. (In fact, the quotient spaces are diffeomorphic not 
just isomorphic.) This means you should be able to get exactly 
comparable results for anything that is an equivalence class invariant, 
including residuals and anything calculated from residuals, such as 
likelihood. Model roots and thus stability are also invariants, but you 
probably will not get comparable results for most other things involving 
parameters.


This is not just a "statistical equivalence" as is sometimes suggested, 
it is an algebraic equivalence between quotient spaces.


However, if you estimate a state-space model and estimate an ARMA model, 
there are several other things that come into play related to 
estimation. Comparing the two estimated models you are unlikely to find 
comparable results. (Typically ARMA estimation is more robust at finding 
the best in my experience.) Even with simulation testing of estimation 
starting with "true" models it is problematic to get estimated models 
that are equivalent.


If you really want to see the equivalence you need to do a conversion of 
a model from one form to the other. I cannot speak to dlm, but dse was 
built for studying this equivalence and the users' guide has examples. 
If you are really interested in this topic, I recommend section 5 of 
http://www.bankofcanada.ca/1993/03/working-paper-199/,
I realize this is getting a bit old, but if you find a more up-to-date 
summary I would like to hear about it. That paper also has a 
demonstration that the equivalent models give results that are 
comparable within numerical precision of computers, not just to some 
statistical significance.


  if you are in an area of the parameter space that the state space
formulation
  can't reach, then you won't get the same parameter estimates. so, what
you're doing
might be okay or might not be, depending on whether the state space
formulation
can reach that area of the parameter space.


"Can't reach" is an estimation problem. This is typically a more serious 
problem with state-space models. The quotient spaces are diffeomorphic 
so in theory it should be possible to reach the same solution if you 
properly account for the fact that you are estimating on a smooth 
manifold and not a vector space. In practice you have also to worry 
about twisting of the parameter space and finite time for estimation, 
and gradients that may converge toward zero near boundaries of the 
manifold's charts.



there's another state space
formulation that is truly equivalent which is called the SSOE formulation
or innovations representation but
I don't know if you want to get into that. google "SSOE state space" if
you're interested.


The quotient space of input-output equivalence classes for innovations 
form models is equivalent to the quotient space of input-output 
equivalence classes for non-innovations form models. You need more 
information to identify a non-innovations form, typically some physical 
understanding of the system. On the bases of only input-output data, and 
with no additional understanding of the physical system, there would be 
no reason to choose a non-innovations form for estimation. There is more 
discussion of this in the above mentioned summary and in the dse user's 
guide.


BTW, estimation problems tend to be much more severe with multivariate 
series than with univariate series. This is not just because of the 
usual issues. Especially in state-space representations, the twisting of 
the parameter space seems to be especially bad.


Paul




Mark


On Mon, Jan 4, 2016 at 9:25 AM, Stefano Sofia <
stefano.so...@regione.marche.it> wrote:


>Dear list users,
>I want to use apply a MA(2) process (x=beta1*epsilon_(t-1) +
>beta2*epsilon_(t-1) + epsilon_(t)) to a given time series (x), and I want
>to estimate the two parameters beta1, beta2 and the variance of the random
>variable epsilon_(t).
>
>If I use
>MA2_1 <- Arima(x, order=c(0,0,2))
>I get the following result
>
>[1] "MA2_1"
>Series: x
>ARIMA(0,0,2) with non-zero 

[R] Error because of large dimension

2016-01-24 Thread li li
Hi all,
  I am doing some calculation with very large dimension. I need to create a
matrix
with three columns and a very large number of rows
(3195*1290*495*35*35*35*15=1.312083e+15) i
n order to allocate calculation result from a for loop.
R does not allow me to create such a matrix because of the large dimension
(see below). Is there a way to go around this?
  Thanks very much!!
 Hanna


> matrix(0, 3195*1290*495*35*35*35*15, 3)
Error in matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
  invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
  NAs introduced by coercion
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error because of large dimension

2016-01-24 Thread Oliver Keyes
Hey Hanna,

nrow and ncol in matrix() are integer-based, for the moment at least;
accordingly they have a maximum value. (3195*1290*495*35*35*35*15) is
actually larger than an integer can hold - you can test this with:

str((3195*1290*495*35*35*35*15))

Which shows that it's stored as a numeric value. And if you try
as.integer((3195*1290*495*35*35*35*15)) you'll get an NA - because
it's too large for an integer to hold.

You could try using the "bigmemory" package, which is designed to
handle very very large matrices (and other datatypes) but I believe
that handling is in terms of making sure you can store the thing by
storing it in a file if necessary - I'm not sure if it allows for
longs (which can store much larger values) for nrow and ncol and
indexing generally. So it may be that, for now, you're out of luck I'm
afraid :(.

On 24 January 2016 at 11:46, li li  wrote:
> Hi all,
>   I am doing some calculation with very large dimension. I need to create a
> matrix
> with three columns and a very large number of rows
> (3195*1290*495*35*35*35*15=1.312083e+15) i
> n order to allocate calculation result from a for loop.
> R does not allow me to create such a matrix because of the large dimension
> (see below). Is there a way to go around this?
>   Thanks very much!!
>  Hanna
>
>
>> matrix(0, 3195*1290*495*35*35*35*15, 3)
> Error in matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>   invalid 'nrow' value (too large or NA)
> In addition: Warning message:
> In matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>   NAs introduced by coercion
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting complete information from XML data file using R-Nested Lists

2016-01-24 Thread sowmiyan
I am working with a XML, which can be found in the link Sample XML file


I am trying to extract each and every fields information to a csv file. I
want my output to be as below: Required output:
*Total of 20 columns and 2 rows*
DateCreated DateModified Creator.UserAccountName Creator.PersonName
Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName
Modifier..attrs.referenceNumber AdditionalEmailStr AdditionalComment
DateIssued DocumentaryInstructions NominationParcel.attr.Referencenumber
NominationParcel.SecondContractNumber
NominationParcel.Coordinator.RefernceNumber
NominationParcel.Coordinator.Username NominationParcel.Coordinator.Email
NominationParcel.Coordinator.Office.Name
NominationParcel.Coordinator.Office.Email
NominationParcel.Coordinator.Office.attrs.referenceNumber
Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker
15351 mkolker Merryn Kolker 15351 Good work   7 sam
Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker
15351 mkolker Merryn Kolker 15351 Nicely Performed   10 107 102

But I am not able to get my output in the required format. I have tried in
two different ways

1 Below is my first code, the problem with this is that my NULL fields are
not getting captured correctly and there is spillover of data. Also I am
not able to capture all the fields of nested lists in the XML

*Code 1*

  doc <- xmlParse("Dummy.xml")
  lst<-xmlToList(doc)
  f <- function(col) do.call(rbind, lapply(lst, function(x)
unlist(x[cols])));
  cols
<-c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued",
"DocumentaryInstructions", "NominationParcel" );
  res <- setNames(lapply(cols, f), cols);
  list2env(res, .GlobalEnv)
*Output 1*


DateCreated DateModified Creator.UserAccountName Creator.PersonName
Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName
Modifier..attrs.referenceNumber AdditionalComment
NominationParcel.Coordinator.UserAccountName
NominationParcel.Coordinator.Office..attrs.referenceNumber
NominationParcel.Coordinator..attrs.referenceNumber
NominationParcel..attrs.referenceNumber
Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker
15351 mkolker Merryn Kolker 15351 Good Work sam 7
Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker
15351 mkolker Merryn Kolker 15351 Nicely performed 102 107 10
2007-11-25T17:18:01

2 To avoid spillover of information of one cell to other because of "NULL",
I have used for loop to replace the NULL cells with NA. By using this I was
able to capture the correct data, but I could not get all the fields
information present in the XML

*Code 2*

   doc <- xmlParse("Dummy.xml")
   lstsub<-xmlToList(doc)
   for(i in 1:length(lstsub))
   {
for(j in 1:length(lstsub[[i]]))
 {
   lstsub[[i]][[j]]=
ifelse(is.null(lstsub[[i]][[j]]),NA,lstsub[[i]][[j]])
   if(length(lstsub[[i]][[j]])>1)
   {
   for(k in 1:length(lstsub[[i]][[j]]))
   {
  lstsub[[i]][[j]][[k]]=
 ifelse(is.null(lstsub[[i]][[j]][[k]]),NA,lstsub[[i]][[j]][[k]])
 if(length(lstsub[[i]][[j]][[k]])>1)
  {
 for(l in 1:length(lstsub[[i]][[j]][[k]]))
   {
lstsub[[i]][[j]][[k]][[l]]=
 ifelse(is.null(lstsub[[i]][[j]][[k]][[l]]),NA,lstsub[[i]][[j]][[k]][[l]])
   }
  }
}
  }
}
  }
   f <- function(col) do.call(rbind, lapply(lstsub, function(x)
unlist(x[cols])));
 cols <-
c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued",
"DocumentaryInstructions", "NominationParcel" );
 res <- setNames(lapply(cols, f), cols);
 list2env(res, .GlobalEnv)
 write.csv(Creator,"dummy_2.csv")

*Output 2*

DateCreated DateModifiedCreator Modifier
 AdditionalEmailStr  AdditionalComment   DateIssued  DocumentaryInstructions

Nomination  2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker mkolker NA
 Good Work   NA  NA
Nomination  2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker mkolker NA
 Nicely performedNA  NA

Could somebody please help me in how could I get the required output

I have posted the same question in Stackoverflow and the link is here (it
might help in giving more clear picture)

http://stackoverflow.com/questions/34963724/extracting-complete-information-from-nested-lists-in-xml-to-a-data-frame-using-r/34963821#34963821


Regards,
Sowmiyan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logical vector of the indices of a string in a vector

2016-01-24 Thread carol white via R-help
Hi, it might be trivial but is there any way to get the logical vector of the 
indices of a string in a vector? I thought that %in% would do but it doesn't. I 
also want to filter the empty fields.
Here I want to extract the non-empty elements containing "Yes":x =c("Yes, fsd", 
"", "No","","Yes, fjsdlf", "")
x[c("Yes") %in% x & x != ""]character(0)

Above, I wanted to do the 2 following operations in 1. Here with grep,  it 
works but %in% in above doesn't:y = x[grep("Yes", x)]
> y = y[y != ""]
> y
[1] "Yes, fsd"    "Yes, fjsdlf"
Thanks
Carol

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logical vector of the indices of a string in a vector

2016-01-24 Thread ruipbarradas
Hello,

Try

x[grepl("Yes", x) & x != ""]

Hope this helps,

Rui Barradas

 

Citando carol white via R-help :

> Hi, it might be trivial but is there any way to get the logical  
> vector of the indices of a string in a vector? I thought that %in%  
> would do but it doesn't. I also want to filter the empty fields.
> Here I want to extract the non-empty elements containing "Yes":x  
> =c("Yes, fsd", "", "No","","Yes, fjsdlf", "")
> x[c("Yes") %in% x & x != ""]character(0)
>
> Above, I wanted to do the 2 following operations in 1. Here with  
> grep,  it works but %in% in above doesn't:y = x[grep("Yes", x)]
>> y = y[y != ""]
>> y
>
> [1] "Yes, fsd"    "Yes, fjsdlf"
> Thanks
> Carol
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help mailing list activity / R-not-help?

2016-01-24 Thread Robert Sherry
I think this mailing list is wonderful and it has helped me a lot. In 
fact, I am not sure I would be using R today if it was not for this

list.

Bob

On 1/24/2016 4:42 PM, Michael Friendly wrote:


On 1/23/2016 7:28 AM, Jean-Luc Dupouey wrote:

Dear members,

Not a technical question:

But one worth raising...


The number of threads in this mailing list, following a long period of
increase, has been regularly and strongly decreasing since 2010, passing
from more than 40K threads to less than 11K threads last year. The trend
is similar for most of the "ancient" mailing lists of the R-project.

[snip ...]


I hope it is the wright place to ask this question. Thanks in advance,



In addition to the other replies, there is another trend I've seen that
has actively worked to suppress discussion on R-help and move it 
elsewhere. The general things:
- R-help was too unwieldy and so it was a good idea to hive-off 
specialized topics to various sub lists, R-SIG-Mac, R-SIG-Geo,

etc.
- Many people posted badly-formed questions to R-help, and so it
was a good idea to develop and refer to the posting guide to mitigate
the number of purely junk postings.


Yet, the trend I've seen is one of increasing **R-not-help**, in that 
there are many posts, often by new R users who get replies that not

infrequently range from just mildly off-putting to actively hostile:

- Is this homework? We don't do homework (sometimes false alarms,
where the OP has to reply to say it is not)
- Didn't you bother to do your homework, RTFM, or Google?
- This is off-topic because XXX (e.g., it is not strictly an R 
programming question).

- You asked about doing XXX, but this is a stupid thing
to want to do.
- Don't ask here; you need to talk to a statistical consultant.

I find this sad in a public mailing list sent to all R-help subscribers
and I sometimes cringe
when I read replies to people who were actually trying to get
help with some R-related problem, but expressed it badly, didn't
know exactly what to ask for, or how to format it,
or somehow motivated a frequent-replier to publicly dis the OP.

On the other hand, I still see a spirit of great generosity among some
people who frequently reply to R-help, taking a possibly badly posed
or ill-formatted question, and going to some lengths to provide a
a helpful answer of some sort.  I applaud those who take the time
and effort to do this.

I use R in a number of my courses, and used to advise students to
post to R-help for general programming questions (not just homework) 
they couldn't solve. I don't do this any more, because several of them

reported a negative experience.

In contrast, in the Stackexchange model, there are numerous sublists
cross-classified by their tags.  If I have a specific knitr, ggplot2, 
LaTeX, or statistical modeling question, I'm now more likely to post 
it there, and the worst that can happen is that no one "upvotes" it

or someone (helpfully) marks it as a duplicate of a similar question.
But comments there are not propagated to all subscribers,
and those who reply helpfully, can see their solutions accepted or not,
or commented on in that specific topic.

Perhaps one solution would be to create a new "R-not-help" list where,
as in a Monty Python skit, people could be directed there to be 
insulted and all these unhelpful replies could be sent.


A milder alternative is to encourage some R-help subscribers to click 
the "Don't send" or "Save" button and think better of their replies.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error because of large dimension

2016-01-24 Thread Henrik Bengtsson
FYI, the matrix you tried to allocate would hold
(3195*1290*495*35*35*35*15) * 3 = 3.936248e+15 values.  Each value
would occupy 8 bytes of memory (for the double data type).  In other
words, in order to keep this data matrix in memory you would require a
computer with at least 3.148998e+16 bytes of RAM, i.e. 29327331 GiB =
28640 TiB = 28 PiB.  Storing such a large matrix even on file is not
possible.

In other words, you need to figure out how to approach your original
problem in a different way.

/Henrik

On Sun, Jan 24, 2016 at 8:46 AM, li li  wrote:
> Hi all,
>   I am doing some calculation with very large dimension. I need to create a
> matrix
> with three columns and a very large number of rows
> (3195*1290*495*35*35*35*15=1.312083e+15) i
> n order to allocate calculation result from a for loop.
> R does not allow me to create such a matrix because of the large dimension
> (see below). Is there a way to go around this?
>   Thanks very much!!
>  Hanna
>
>
>> matrix(0, 3195*1290*495*35*35*35*15, 3)
> Error in matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>   invalid 'nrow' value (too large or NA)
> In addition: Warning message:
> In matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
>   NAs introduced by coercion
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting complete information from XML data file using R-Nested Lists

2016-01-24 Thread Oliver Keyes
Hey Sowmiyan,

I would recommend taking a look at the xml2, rather than xml, package
for a start. It's a lot more structured and traversing between
elements far easier :)

On 24 January 2016 at 12:27, sowmiyan  wrote:
> I am working with a XML, which can be found in the link Sample XML file
> 
>
> I am trying to extract each and every fields information to a csv file. I
> want my output to be as below: Required output:
> *Total of 20 columns and 2 rows*
> DateCreated DateModified Creator.UserAccountName Creator.PersonName
> Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName
> Modifier..attrs.referenceNumber AdditionalEmailStr AdditionalComment
> DateIssued DocumentaryInstructions NominationParcel.attr.Referencenumber
> NominationParcel.SecondContractNumber
> NominationParcel.Coordinator.RefernceNumber
> NominationParcel.Coordinator.Username NominationParcel.Coordinator.Email
> NominationParcel.Coordinator.Office.Name
> NominationParcel.Coordinator.Office.Email
> NominationParcel.Coordinator.Office.attrs.referenceNumber
> Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker
> 15351 mkolker Merryn Kolker 15351 Good work   7 sam
> Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker
> 15351 mkolker Merryn Kolker 15351 Nicely Performed   10 107 102
>
> But I am not able to get my output in the required format. I have tried in
> two different ways
>
> 1 Below is my first code, the problem with this is that my NULL fields are
> not getting captured correctly and there is spillover of data. Also I am
> not able to capture all the fields of nested lists in the XML
>
> *Code 1*
>
>   doc <- xmlParse("Dummy.xml")
>   lst<-xmlToList(doc)
>   f <- function(col) do.call(rbind, lapply(lst, function(x)
> unlist(x[cols])));
>   cols
> <-c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued",
> "DocumentaryInstructions", "NominationParcel" );
>   res <- setNames(lapply(cols, f), cols);
>   list2env(res, .GlobalEnv)
> *Output 1*
>
>
> DateCreated DateModified Creator.UserAccountName Creator.PersonName
> Creator..attrs.referenceNumber Modifier.UserAccountName Modifier.PersonName
> Modifier..attrs.referenceNumber AdditionalComment
> NominationParcel.Coordinator.UserAccountName
> NominationParcel.Coordinator.Office..attrs.referenceNumber
> NominationParcel.Coordinator..attrs.referenceNumber
> NominationParcel..attrs.referenceNumber
> Nomination 2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker Merryn Kolker
> 15351 mkolker Merryn Kolker 15351 Good Work sam 7
> Nomination 2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker Merryn Kolker
> 15351 mkolker Merryn Kolker 15351 Nicely performed 102 107 10
> 2007-11-25T17:18:01
>
> 2 To avoid spillover of information of one cell to other because of "NULL",
> I have used for loop to replace the NULL cells with NA. By using this I was
> able to capture the correct data, but I could not get all the fields
> information present in the XML
>
> *Code 2*
>
>doc <- xmlParse("Dummy.xml")
>lstsub<-xmlToList(doc)
>for(i in 1:length(lstsub))
>{
> for(j in 1:length(lstsub[[i]]))
>  {
>lstsub[[i]][[j]]=
> ifelse(is.null(lstsub[[i]][[j]]),NA,lstsub[[i]][[j]])
>if(length(lstsub[[i]][[j]])>1)
>{
>for(k in 1:length(lstsub[[i]][[j]]))
>{
>   lstsub[[i]][[j]][[k]]=
>  ifelse(is.null(lstsub[[i]][[j]][[k]]),NA,lstsub[[i]][[j]][[k]])
>  if(length(lstsub[[i]][[j]][[k]])>1)
>   {
>  for(l in 1:length(lstsub[[i]][[j]][[k]]))
>{
> lstsub[[i]][[j]][[k]][[l]]=
>  ifelse(is.null(lstsub[[i]][[j]][[k]][[l]]),NA,lstsub[[i]][[j]][[k]][[l]])
>}
>   }
> }
>   }
> }
>   }
>f <- function(col) do.call(rbind, lapply(lstsub, function(x)
> unlist(x[cols])));
>  cols <-
> c("DateCreated","DateModified","Creator","Modifier","AdditionalEmailStr","AdditionalComment","DateIssued",
> "DocumentaryInstructions", "NominationParcel" );
>  res <- setNames(lapply(cols, f), cols);
>  list2env(res, .GlobalEnv)
>  write.csv(Creator,"dummy_2.csv")
>
> *Output 2*
>
> DateCreated DateModifiedCreator Modifier
>  AdditionalEmailStr  AdditionalComment   DateIssued  DocumentaryInstructions
>
> Nomination  2007-11-25T17:01:32 2007-11-25T17:11:09 mkolker mkolker NA
>  Good Work   NA  NA
> Nomination  2007-11-25T17:18:01 2007-11-25T17:19:11 mkolker mkolker NA
>  Nicely performedNA  NA
>
> Could somebody please help me in how could I get the required output
>
> I have posted the same question in Stackoverflow and the link is here (it
> might help in giving more clear picture)
>
> http://stackoverflow.com/questions/34963724/extracting-complete-information-from-nested-lists-in-xml-to-a-data-frame-using-r/34963821#34963821
>
>
> Regards,
> Sowmiyan
>
> 

Re: [R] R-help mailing list activity / R-not-help?

2016-01-24 Thread Michael Friendly


On 1/23/2016 7:28 AM, Jean-Luc Dupouey wrote:

Dear members,

Not a technical question:

But one worth raising...


The number of threads in this mailing list, following a long period of
increase, has been regularly and strongly decreasing since 2010, passing
from more than 40K threads to less than 11K threads last year. The trend
is similar for most of the "ancient" mailing lists of the R-project.

[snip ...]


I hope it is the wright place to ask this question. Thanks in advance,



In addition to the other replies, there is another trend I've seen that
has actively worked to suppress discussion on R-help and move it 
elsewhere. The general things:
- R-help was too unwieldy and so it was a good idea to hive-off 
specialized topics to various sub lists, R-SIG-Mac, R-SIG-Geo,

etc.
- Many people posted badly-formed questions to R-help, and so it
was a good idea to develop and refer to the posting guide to mitigate
the number of purely junk postings.


Yet, the trend I've seen is one of increasing **R-not-help**, in that 
there are many posts, often by new R users who get replies that not

infrequently range from just mildly off-putting to actively hostile:

- Is this homework? We don't do homework (sometimes false alarms,
where the OP has to reply to say it is not)
- Didn't you bother to do your homework, RTFM, or Google?
- This is off-topic because XXX (e.g., it is not strictly an R 
programming question).

- You asked about doing XXX, but this is a stupid thing
to want to do.
- Don't ask here; you need to talk to a statistical consultant.

I find this sad in a public mailing list sent to all R-help subscribers
and I sometimes cringe
when I read replies to people who were actually trying to get
help with some R-related problem, but expressed it badly, didn't
know exactly what to ask for, or how to format it,
or somehow motivated a frequent-replier to publicly dis the OP.

On the other hand, I still see a spirit of great generosity among some
people who frequently reply to R-help, taking a possibly badly posed
or ill-formatted question, and going to some lengths to provide a
a helpful answer of some sort.  I applaud those who take the time
and effort to do this.

I use R in a number of my courses, and used to advise students to
post to R-help for general programming questions (not just homework) 
they couldn't solve. I don't do this any more, because several of them

reported a negative experience.

In contrast, in the Stackexchange model, there are numerous sublists
cross-classified by their tags.  If I have a specific knitr, ggplot2, 
LaTeX, or statistical modeling question, I'm now more likely to post it 
there, and the worst that can happen is that no one "upvotes" it

or someone (helpfully) marks it as a duplicate of a similar question.
But comments there are not propagated to all subscribers,
and those who reply helpfully, can see their solutions accepted or not,
or commented on in that specific topic.

Perhaps one solution would be to create a new "R-not-help" list where,
as in a Monty Python skit, people could be directed there to be insulted 
and all these unhelpful replies could be sent.


A milder alternative is to encourage some R-help subscribers to click 
the "Don't send" or "Save" button and think better of their replies.



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error because of large dimension

2016-01-24 Thread William Dunlap via R-help
> 28 PiB.  Storing such a large matrix even on file is not possible.

The ads for Amazon Red Shift say it is possible.  E.g.,
  Amazon Redshift is a fast, fully managed, petabyte-scale data
  warehouse that makes it simple and cost-effective to analyze
  all your data using your existing business intelligence tools.
  Start small for $0.25 per hour with no commitments and scale
  to petabytes for $1,000 per terabyte per year, less than a tenth
  the cost of traditional solutions. Customers typically see 3x
  compression, reducing their costs to $333 per uncompressed
  terabyte per year.

Cost may be an issue:

28 petabytes * 1024 petabytes/terabyte * $333 terabyte/year ~= $9.5
million/year
or $26 thousand/day.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Jan 24, 2016 at 1:29 PM, Henrik Bengtsson <
henrik.bengts...@gmail.com> wrote:

> FYI, the matrix you tried to allocate would hold
> (3195*1290*495*35*35*35*15) * 3 = 3.936248e+15 values.  Each value
> would occupy 8 bytes of memory (for the double data type).  In other
> words, in order to keep this data matrix in memory you would require a
> computer with at least 3.148998e+16 bytes of RAM, i.e. 29327331 GiB =
> 28640 TiB = 28 PiB.  Storing such a large matrix even on file is not
> possible.
>
> In other words, you need to figure out how to approach your original
> problem in a different way.
>
> /Henrik
>
> On Sun, Jan 24, 2016 at 8:46 AM, li li  wrote:
> > Hi all,
> >   I am doing some calculation with very large dimension. I need to
> create a
> > matrix
> > with three columns and a very large number of rows
> > (3195*1290*495*35*35*35*15=1.312083e+15) i
> > n order to allocate calculation result from a for loop.
> > R does not allow me to create such a matrix because of the large
> dimension
> > (see below). Is there a way to go around this?
> >   Thanks very much!!
> >  Hanna
> >
> >
> >> matrix(0, 3195*1290*495*35*35*35*15, 3)
> > Error in matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
> >   invalid 'nrow' value (too large or NA)
> > In addition: Warning message:
> > In matrix(0, 3195 * 1290 * 495 * 35 * 35 * 35 * 15, 3) :
> >   NAs introduced by coercion
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.