Hi, I have a few comments to add with regards to this as well. This is mainly from a user's point of view, as I've been having quite a few of these discussions lately. First, let me provide the context from which I'm approaching the problem -- the usefulness of any integration (for me) is based on two types of users: "advanced" and "basic". An advanced user should already be somewhat familiar with R and be able to script, simply needing an easy way to organize data and statistical explorations. A basic user, on the other hand, is one that doesn't know R but needs a few statistical tools for his/her data analysis (essentially, the user shouldn't even know what R is, just that Calc is using it to calculate results).
Mr. Mada - with regards to your example, you raise an interesting point. Any implementation would have to limit an advanced user and force them to reorganize data to some degree. However, my experience with R integration is one that makes this pretty flexible, as long as there is a way to represent Calc data as array. Luckily, I understand this to be the case for Calc. For a basic user, providing a wizard of some sort would also alleviate this problem by providing some flexibility and throwing errors (or advice!) about how to organize data -- sort of like Excel and Calc do now. Just an aside, it might be worthwhile to provide users (especially basic ones) with links to tools like Wikibooks ( http://en.wikibooks.org/wiki/Statistics), Wolfram ( http://mathworld.wolfram.com/topics/ProbabilityandStatistics.html), or other sites where they could learn about the subtleties of the methods they're using. Ultimately, if certain objects seem very useful in R and need to be represented in our integration (such as the discussion with factors), this may have to be coded into the software to facilitate the user experience. Maybe one of the first tasks of such an implementation should be to decide which features are most important (or take from http://wiki.services.openoffice.org/wiki/Statistical_Data_Analysis_Tool), and how best to abstract them all from the two types of users above (or other types one may think of). This could help significantly when it finally comes down to programming the tools. Just some thoughts, if this were to go further. :) Thank you, Wojciech On 4/1/07, Leonard Mada <[EMAIL PROTECTED]> wrote:
Dear Prof Neuwirth, my comments were intended mainly to implement some functionality for complete *R newbies*, that is for persons who do not know how to work in R. [It is debatable then if they need factors for example, but for the sake of completion, I mentioned those elements.] I imagined the following three scenarios: 1. (for comment c) I encountered often variables that were NOT limited to a single column: e.g. age was written both in columns A, B, C and D. Therefore, I would like to be able to select a range individually for every variable: age: A1:D50 Similarly, Blood Pressure: H1:I100 The A1:H100 selection won't work in this case, because this is not a data frame as the same data is split into columns. Rewriting the data into a single column is time-consuming (and the data is often written in more than one column because of another factor, so rewriting it into a single column will loose some information). 2. (comment b) I had in mind the: fisher.test( matrix( c(number_1, number_2, number_3, number_4), 2 )) and similar contingency tables. Somebody who does NOT have any idea of R, won't be able to perform such a simple test until he learns to construct a matrix. BUT if the user only needs the fisher test, then , you are right, there is no real need for the user to create a matrix on his own; it is easy to select the contingency table and to pass the correct command to R (without the user having to know more details about matrices). 3. (comment a) The previous comment applies to factors, too. I had primary in mind the ANOVA test, but then again, the user does not need to know the details of parsing the arguments and everything could be hidden in the implementation. Somebody who needs factors for a different analysis, will most likely know how to create them in R. Unfortunately, I am a little bit limited when judging RExcel: - first, I work a lot with R, so I am not a newbie, BUT I try to think what a newbie might want; this may sometimes backfire - secondly, I work with R mostly at home; however, I do not use Excel at home Well, the issues I described were meant to allow newbies to perform some specific tests, but it is debatable if such a newbie does indeed need to know and use those details (those issues could be implemented/hidden in specific menu commands). Sincerely, Leonard Erich Neuwirth wrote: > Sorry for not answering earlier. > You mention some open issues. > Let me ask some more questions to clarify these. > > > > Leonard Mada wrote: > >> Dear Prof. Neuwirth, >> depth). Mainly three issues remained uncovered: a.) converting input >> data/vectors to factors (is useful sometimes), >> > > It you transfer a variable to R and then just apply the function factor > in R, it is converted to a factor. > Do you need more functionality? > > > b.) importing data as > >> matrices (for contingency tables, e.g. for a Fisher exact test) and >> > > I assume by importing you mean import into the spreadsheet. > The basic unit for data transfer in our framework is an array > of a single underlying scalar type. (character, (real) number, > complex number, time&date). So transferring a matrix in either direction > is there. There is, however, one major concern. > Spreadsheet programs don't care too much about missing values. > I have not looked into the details of this in Calc, but I think > that it is important to be careful about this problem in the interface. > > c.) > >> converting a data range to multiple vectors vs independently selecting >> the data ranges for the multiple vectors. >> > > When a range is transferred as a dataframe, in R you can immediately > access the columns as vectors. What kind of additional functionality > do you think is needed? > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Five Minutes to Midnight: Youth on human rights and current affairs http://www.fiveminutestomidnight.org/
