Re: [R] generalized hypergeometric function
On Sun, 2006-04-16 at 17:54 +, Ben Bolker wrote: Marc Schwartz MSchwartz at mn.rr.com writes: On Sat, 2006-04-15 at 20:59 -0300, Bernardo Rangel tura wrote: Hi R-masters I need compute generalized hypergeometric function. I look in R-project and R-help list and not find nothing about generalized hypergeometric function Is possible calculate the generalized hypergeometric function? Somebody have a script for this? Note that he is looking for the h'geom FUNCTION, not DISTRIBUTION (e.g. http://mathworld.wolfram.com/GeneralizedHypergeometricFunction.html); Robin Hankin wrote some code (hypergeo in the Davies package on CRAN) to compute a particular Gaussian h'geom function, and was asking at one point on the mailing list whether anyone was interested in other code; I don't know whether it will be generalized enough for you. Ben Bolker Thanks Ben. I stand corrected on that point. Didn't click in my initial reading. Regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] creating empty cells with table()
On Wed, 2006-04-19 at 09:21 -0400, Owen, Jason wrote: Hello, Suppose I simulate 20 observations from the Poisson distribution with lambda = 4. I can summarize these values using table() and then feed that to barplot() for a graph. Problem: if there are empty categories (e.g. 0) or empty categories within the range of the data (e.g. observations for 6, 7, 9), barplot() does not include the empty cells in the x-axis of the plot. Is there any way to specify table() to have specific categories (in the above example, probably 0:12) so that zeroes are included? Thanks, Jason One thought comes to mind, which is based upon table()'s internal behavior, where it interprets the vectors passed as a factor, for the purpose of the [cross-]tabulation. Thus: x - rpois(20, 4) x [1] 4 4 3 8 2 4 5 2 3 2 4 5 5 5 6 4 5 8 2 5 table(x) x 2 3 4 5 6 8 4 2 5 6 1 2 # Add the desired factor levels table(factor(x, levels = 0:12)) 0 1 2 3 4 5 6 7 8 9 10 11 12 0 0 4 2 5 6 1 0 2 0 0 0 0 For the barplot: barplot(table(factor(x, levels = 0:12))) HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] prop.table on three-way table?
dim1_no1 dim2_no1 00000 dim2_no3 01100 dim2_no4 00000 dim2_no5 00000 dim1_no3 dim2_no1 00000 dim2_no3 00000 dim2_no4 00201 dim2_no5 00100 dim1_no4 dim2_no1 01000 dim2_no3 00000 dim2_no4 10010 dim2_no5 00000 dim1_no5 dim2_no1 00000 dim2_no3 10000 dim2_no4 00000 dim2_no5 00000 and...the output of using ctab() is: ctab(x, type = row, percentages = FALSE) dim3_no1 dim3_no2 dim3_no3 dim3_no4 dim3_no5 dim1_no1 dim2_no1 NaN NaN NaN NaN NaN dim2_no3 0.00 0.50 0.50 0.00 0.00 dim2_no4 NaN NaN NaN NaN NaN dim2_no5 NaN NaN NaN NaN NaN dim1_no3 dim2_no1 NaN NaN NaN NaN NaN dim2_no3 NaN NaN NaN NaN NaN dim2_no4 0.00 0.00 0.67 0.00 0.33 dim2_no5 0.00 0.00 1.00 0.00 0.00 dim1_no4 dim2_no1 0.00 1.00 0.00 0.00 0.00 dim2_no3 NaN NaN NaN NaN NaN dim2_no4 0.50 0.00 0.00 0.50 0.00 dim2_no5 NaN NaN NaN NaN NaN dim1_no5 dim2_no1 NaN NaN NaN NaN NaN dim2_no3 1.00 0.00 0.00 0.00 0.00 dim2_no4 NaN NaN NaN NaN NaN dim2_no5 NaN NaN NaN NaN NaN Note that for rows where the total is 0, you end up with NaN (Not a Number), as opposed to 0. Does that get you want you want? HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] x-axis tick mark labels running vertically
On Fri, 2004-05-07 at 07:02, Mohamed Abdolell wrote: I'm plotting obesity rates (y-axis) vs Public Health Unit (x-axis) for the province of Ontario and would like to have the Public Health Unit names appear vertically rather than the default, horizontally. I'm actually using the 'barplot2' function in the {gregmisc} library ... I haven't been able to find a solution in either the barplot2 options or the general plotting options. Any pointers would be appreciated. - Mohamed You need to adjust par(las) to alter the orientation of the of the axis labels: barplot2(1:4, names.arg = c(one, two, three, four), las = 2) You can also use the axis() function separately: barplot2(1:4) axis(1, labels = c(one, two, three, four), las = 2) Setting par(las = 2) rotates the axis labels so that they are perpendicular to the axis. See ?par for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] log Y scales for parplot
On Fri, 2004-05-14 at 12:11, Monica Palaseanu-Lovejoy wrote: Hi, I am doing a barplot, and the fist bar is very big (high values) and the rest of the bars are quite small (small values). So - is there any way to make the Y scale logarithmic so that i have a wider distance from 0 to 50 for example than for 50 to 100, and so on? Thanks in advance for any help, Monica Monica, See the barplot2() function in the 'gregmisc' package on CRAN, which supports the use of log axis scaling. For example: barplot2(c(5000, 50, 75, 100), log = y) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] UseR 2004 Proceeding
On Tue, 2004-05-25 at 09:11, Agustin Calatroni wrote: In the past conference (i.e. DSC 2003) the proceedings were available for download. This year UseR website only has the abstracts of the papers. Does anybody know if the full text will be available for download? Thanks, Agustin Calatroni ;-) According to an announcement made on Saturday there will not be a proceedings. Some authors may post their complete presentation on their web sites (where available) or perhaps may be willing to e-mail their presentation in PDF format. You may be best served to contact the author(s) directly, if there is a particular paper you are interested in. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Opening help pages in new tab of existing Mozilla Firebird
On Wed, 2004-05-26 at 10:32, Kevin Wright wrote: Subject pretty much says it all. I currently have options()$browser set to open help pages in Mozilla Firebird, but it starts a new window each time and I would like a new 'tab' in an existing window. Sorry if this is obvious, but I can't find anything. Kevin Wright You do not indicate which OS you are running, but under Linux, you can use a script such as the following. It will check the current process list to see if an instance of Firefox is already present. If so, it will open a new tab. Otherwise, it opens a new window. #!/bin/sh # if 'firefox-bin' in current ps listing, if ps -e|grep firefox-bin /dev/null 21; then firefox -remote openURL(${1}, new-tab) exit 0 else #open new instance firefox $1 exit 0 fi Copy the above into a script file and set it to be executable (chmod +x ScriptFileName). Then set options()$browser to use the script file. Note also that the recent version of the Mozilla standalone browser is called Firefox, in recognition of the existence of the Firebird (formerly Interbase) SQL database project. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gauss.hermite?
On Thu, 2004-05-27 at 23:35, Spencer Graves wrote: The search at www.r-project.org mentioned a function gauss.hermite{rmutil}. However, 'install.packages(rmutil)' produced, 'No package rmutil on CRAN.' How can I find the current status of gauss.hermite and rmutil? Thanks, Spencer Graves Spencer, A Google search indicates that rmutil is listed on http://cran.r-project.org/other-software.html. There is a link at the bottom of the page to Jim Lindsey's site at: http://popgen0146uns50.unimaas.nl/~jlindsey/rcode.html There are links for rmutil.tgz and rmutil.zip below the middle of the page. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to Describe R to Finance People
On Fri, 2004-06-04 at 09:19, Paul Gilbert wrote: Ko-Kang Kevin Wang wrote: It is not only used by statisticians or scientists, but also econometricians and people in finance due to its cost (FREE) and its powerfulness. I think (FREE) will distract your intended audience from the real point. In a corporate environment, lots of people argue that free software actually costs more than commercial software because of internal support cost, etc, etc. These arguments will all hinge critically on the corporate IT support abilities. For R, I have never seen a convincing argument that it costs more, but the real point is that this is irrelevant. If it costs less, that is nice. If it costs more, then that is what you pay to use something that is better. If they need to, I think people in finance are generally willing to pay, so I think it is a mistake to put much emphasis on the cost. Put the emphasis on how good it is. I agree that quality and value are important, but I think that the issue of cost should not be discounted out of hand. Value (for both company and client) is directly tied to cost. Cost may be less of a concern for very large corporations to some extent, though certainly non-trivial as we continue to see companies finding ways to reduce their cost of operations as an important part of the strategy to improve profitability. Typically, this is done via reductions in personnel costs (ie. layoffs, reductions in benefits, salary/wage cuts, etc.), but IT costs are surely a target as is noted daily/weekly/monthly in various IT and business trade rags. IT costs are not just those associated with the initial purchase, but with ongoing operating costs as well. I can speak from personal experience, as the President and Owner of a health care consulting business who has funded this operation with my own funds, that cost is a significant issue. I do not have shareholders or private investors providing operating capital with the promise of future returns on their investments. Every dollar I spend has to be recovered via client billings or it comes out of my own pocket. This is not just important for me, but for my clients as well. The more I spend on the cost of doing business, the more that I would have to pass on to my clients to recoup those same costs. My ability to offer clients reasonable project fees is directly correlated to my underlying cost structure. There is a market driven threshold beyond which I could not pass those costs on to clients and still have clients willing to pay for services. A product like SAS for example, which I had previously used for a number of years working for a larger medical software company, is no longer affordable to me as a small business owner. The last time that I checked, the annual licensing for a single user commercial license for Base, Stat and Graph was in the neighborhood of $5,000 U.S. **Per Year**. That is for _one person_. Calculate those costs for a larger staff... Even a product such as that other R like commercial offering, while less costly than SAS, still adds to overhead. I would rather allocate the funds for that product along with my time, to supporting the R Foundation and this community to repay the value and benefit that I receive from it (which is nothing short of phenomenal). The bottom line is that cost is a non-trivial issue. If a company is willing to pay more for a functionally equivalent product, because the training and support is (or is perceived to be) superior so be it. That may enable managers and other decision makers to sleep better at night. I would however challenge the level of support provided by any commercial company to that which is provided by this community, given the depth and breadth of expertise present and the expedience with which communications take place here. I use R. My company benefits from it. My clients benefit from it. ..and I sleep just fine (when I do sleep)... :-) Regards, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plot documentation; Axis documentation
On Fri, 2004-06-04 at 12:01, Glynn, Earl wrote: Why when I do a help(plot) do I not see anything about parameters such as xlim or ylim? As someone new to R, finding that xlim and ylim even existed wasn't all that easy. Even help.search(xlim) shows nothing once I know xlim exists. I'd like to change the default axes but help(axis) isn't that informative about changing the frequency of ticks on the axes. Do people really refer to the x-axis as 1 and the y-axis as 2 as shown in help(axis)? plot(1:4, rnorm(4), axes=FALSE) axis(1, 1:4, LETTERS[1:4]) axis(2) I hadn't a clue what the 1 and 2 meant here without reading additional documentation. And where is the LETTERS constant defined and what else is defined there? Are there no common R constants defined somewhere so the axes be defined symbolically? Perhaps AXIS_X = 1, AXIS_Y = 2 would be better than just 1 and 2: plot(1:4, rnorm(4), axes=FALSE) axis(AXIS_X, 1:4, LETTERS[1:4]) axis(AXIS_Y) This would at least provide a clue about what is going on here. Why is R such a graphics rich language and the documentation is so lacking in graphics examples? Why can't the documentation include graphics too so one can study code and graphics at the same time? How do I know the graphics I'm seeing is what it's supposed to look like? I'd rather do more in R than MatLab but I find the R documentation somewhat lacking. I prefer not to read the R source code to find the answers. Thanks for any insight about this. efg Reading the posting guide, for which there is a link at the bottom of each list e-mail, would be a good place to start. The section on Further Resources provides important links. Specifically on graphics: 1. Start by reading chapter 12 in An Introduction to R, which covers graphics basics. 2. VR's MASS also has an excellent chapter (4) on graphics. 3. There is also an article in R News R Help Desk (http://cran.r-project.org/doc/Rnews/Rnews_2003-2.pdf) that would likely be helpful as well. Reviewing these resources would be crucial to assist your comprehension. I think that you will find the documentation for R to be substantial, if you take the time to properly research it. The posting guide will help get you started in that endeavor. In most cases this obviates any need to review source code, though a critical advantage of R is the ability to do just that when you need to. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to Describe R to Finance People
On Fri, 2004-06-04 at 12:47, Tamas Papp wrote: On Fri, Jun 04, 2004 at 11:06:59AM -0500, Marc Schwartz wrote: I agree that quality and value are important, but I think that the issue of cost should not be discounted out of hand. Value (for both company and client) is directly tied to cost. [...] The bottom line is that cost is a non-trivial issue. If a company is willing to pay more for a functionally equivalent product, because the training and support is (or is perceived to be) superior so be it. That may enable managers and other decision makers to sleep better at night. I agree with your points. However, as far as I remember, the original poster wants to give a 2-3 minute summary about the benefits of R. I would not open such a complicated issue (the total cost of ownership) in such a small timeframe, but focus on the more technical benefits instead. Actually, Kevin indicated 1 to 2 minutes... ;-) My response was clearly more detailed than what Kevin could incorporate into the presentation structure. It was more a response to Paul's comments on cost not being an issue. My experience indicates otherwise. The TCO issue is clearly a subject of much debate, especially when fueled by widely disseminated studies funded (overtly or otherwise) by a certain large software company based in the northwestern U.Sthat tends to introduce a certain 'a priori' bias. I agree that you cannot adequately address the issue in such a short talk. However, I think that it can be raised briefly, with supporting comments, such as those made by Frank Harrell regarding reproducible analyses, which also supports cost reduction via improvements in quality and productivity. Something that point and click based tools cannot offer effectively. Thanks for raising the clarification Tamas. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to Describe R to Finance People
On Fri, 2004-06-04 at 14:26, Paul Gilbert wrote: Marc Schwartz wrote: snip I agree that quality and value are important, but I think that the issue of cost should not be discounted out of hand. Value (for both company and client) is directly tied to cost. [snip ...] Marc I agree with this, and most of what you say. Cost is important in both large and small companies, and also in government. The point is really that total cost of ownership is a very complicate thing, and you should not get into it without the specifics of a particular company and situation in mind. For example, if the end users takes responsibility for all the support, the cost implications will be very different from the situation where the IT department needs to guarantee availability. Even in your own company you may have a very different attitude with respect to your research software and your accounts receivable software. People in finance making real time market decisions will have a very different cost structure from academics in finance. Paul, Sorry for the delay in my reply. A sudden request came from a client this afternoon and I just finished the analysis. I am in agreement with you that situational differences will bias the focus on costs and perhaps even the ability to define them well. Within the timeframe that Kevin and/or his associate has for this presentation, this topic cannot be adequately covered. That does not mean that you cannot raise it for further consideration by the audience within the context of pointing out R's strengths. I suspect that if you point out these issues, somebody with the right insight will have a light bulb go on relative to the possibility of cost savings versus their current set of tools. They will of course need to pursue that line of thinking outside of the scope of this presentation, which is fine. I do use a commercial product for accounting (which is the only reason that I still dual-boot Windows and Fedora Core 2.) My alternative is to send all of my paperwork to my accountant to do all of my ledger entries, which at $300 U.S. per hour is not inconsequential. So, yes, I make a cost based decision to purchase a commercial OTS product that enables me to do the grunt work and send an electronic file to my accountant for review. My cost per unit of time is cheaper than my accountant's. If I could do the same thing with an open source product, hallelujah. It would save me even more. Unfortunately, as is oft discussed, this is one area in which the Linux world is still lacking. I suspect that will change in time however. On the other hand, I use a payroll services company to handle that part of the business. Their costs to run the payroll, pay taxes, worker's comp and all the rest of the associated procedures are cheaper than what I could do on my own. It is not that I could not do it technically, but that use of my time would in the long run cost me more money than what I pay for the service. Each component of the process does need to be evaluated within the context of the alternatives and appropriate risk/benefit considerations. The point was really that many people are very sensitive to arguments about cost, and often have positions that they feel obliged to promote. So, as soon as you mention cost you are likely to get into a very long discussion that will not be fruitful unless you are prepared to talk about very particular situations. For this particular audience I think it would probably be more to the point to describe how good and reliable R is. A particular company may decided R is just too expensive. For example, some companies in finance are worried about real time decision making. They may have to hire 10 more IT staff to guarantee 24/7 availaility with no more than 5 minutes outage per week whereas, with commercial software, they may be able to buy guarantees. (This is not a statement about commercial software being more reliable, it is a statement about being able to buy insurance.) But, as you say, for must of us R is a real bargain. There is no doubt that folks are willing to spend more in some cases for a piece of paper that guarantees availability and perhaps some form of compensation for downtime, which in these situations will likely result in lost revenue. As long as clients are willing to pay for those features and/or companies determine that business imperatives warrant such expenditures... I think that your closing point relative to R's reliability is important. This goes to the old saying Facts are negotiable, perception is reality. Why would R be more or less available or reliable than another analytic tool? More often than not, I suspect that it is not the application but the underlying infrastructure that results in reduced availability. The perception issue may be the biggest hurdle that the open source world needs to (and will) overcome in the commercial marketplace. Anyway, I think
Re: [R] error during make of R-patched on Fedora core 2
On Mon, 2004-06-07 at 15:08, Gavin Simpson wrote: Dear list, I've just upgraded to Fedora Core 2 and seeing as there wasn't an rpm for this OS on CRAN yet I thought it was about time I had a go at compiling R myself. Having run into the X11 problem I switched to trying to install R-patched. I followed the instructions in the R Installation Admin manual to download the sources of the Recommended packages and place the files in R_HOME/src/library/Recommended. ./configure worked fine so I progressed to make, which has hit upon this error when the process arrived at the Recommended packages: make[2]: Leaving directory `/home/gavin/tmp/R-patched/src/library' make[2]: Entering directory `/home/gavin/tmp/R-patched/src/library' building/updating vignettes for package 'grid' ... make[2]: Leaving directory `/home/gavin/tmp/R-patched/src/library' make[2]: Entering directory `/home/gavin/tmp/R-patched/src/library' make[2]: Leaving directory `/home/gavin/tmp/R-patched/src/library' make[1]: Leaving directory `/home/gavin/tmp/R-patched/src/library' make[1]: Entering directory `/home/gavin/tmp/R-patched/src/library/Recommended' make[2]: Entering directory `/home/gavin/tmp/R-patched/src/library/Recommended' make[2]: Leaving directory `/home/gavin/tmp/R-patched/src/library/Recommended' make[2]: Entering directory `/home/gavin/tmp/R-patched/src/library/Recommended' make[2]: *** No rule to make target `survival.ts', needed by `stamp-recommended'. Stop. make[2]: Leaving directory `/home/gavin/tmp/R-patched/src/library/Recommended' make[1]: *** [recommended-packages] Error 2 make[1]: Leaving directory `/home/gavin/tmp/R-patched/src/library/Recommended' make: *** [stamp-recommended] Error 2 Being a relative newbie to Linux I have no-idea how to continue to solve this issue :-( The only difference I can see between the /src/library/Recommended directories of R-1.9.0 and R-patched is that in R-1.9.0 it contains links to each of the tar.gz (excluding the version info) as well as the tar.gz themselves for each of the packages. Is this in some way related to my problem? If anyone can help me solve this issue I'd be most grateful. Thanks in advance, Gavin You might want to try the following commands using rsync as an alternative to downloading the tarball: rsync -rCv rsync.r-project.org::r-patched R-patched ./R-patched/tools/rsync-recommended cd R-patched ./configure make I actually have the above in a script file that I can just run quickly, when I want to update the code. I am now running FC2, so if you have any problems, drop me a line. Best regards, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error during make of R-patched on Fedora core 2
On Mon, 2004-06-07 at 15:51, Gavin Simpson wrote: snip Thanks Roger and Marc, for suggesting I use ./tools/rsync-recommended from within the R-patched directory. This seems to have done the trick as make completed without errors this time round. The Recommended directory also contained the links to the actual tar.gz files after doing the rsync command, so I guess this was the problem (or at least related to it.) I'm off home now with the laptop to see if I can finish make check-all and make install R. I have re-read the section describing the installation process for R-patched or R-devel in the R Installation and Administration manual (from R.1.9.0) just in case I missed something. Section 1.2 of this manual indicates that one can proceed *either* by downloading R-patched and then the Recommended packages from CRAN and placing the tar.gz files in R_HOME/src/library/Recommended, or by using rsync to download R-patched, and then to get the Recommended packages. The two are quite separately documented in the manual, and do seem to be in disagreement with the R-sources page on the CRAN website, which doesn't mention the manual download method (for Recommended) at all. Is there something wrong with the current Recommended files on CRAN, or is the section in the R Installation Admin manual out-of-date or in error, or am I missing something vital here? This isn't a complaint: I'm just pointing this out in case this is something that needs updating in the documentation. All the best, Gavin Perhaps I am being dense, but in reviewing the two documents (R Admin and the CRAN sources page), I think that the only thing lacking is a description on the CRAN page of the manual download option for the Rec packages. You would need to go here now for 1.9.1 Alpha/Beta which is where the current r-patched is: http://www.cran.mirrors.pair.com/src/contrib/1.9.1/Recommended/ The standard links on CRAN are for the current 'released' version, which is still 1.9.0 for the moment. Procedurally, I think that the rsync approach is substantially easier (one step instead of multiple downloads) and certainly less error prone. Also the ./tools/rsync-recommended script is set up to pick up the proper package versions, which also helps to avoid conflicts. HTH, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error during make of R-patched on Fedora core 2
On Tue, 2004-06-08 at 06:23, Prof Brian Ripley wrote: On Tue, 8 Jun 2004, Gavin Simpson wrote: Marc Schwartz wrote: On Mon, 2004-06-07 at 15:51, Gavin Simpson wrote: snip snip Perhaps I am being dense, but in reviewing the two documents (R Admin and the CRAN sources page), I think that the only thing lacking is a description on the CRAN page of the manual download option for the Rec packages. snip Yes, but having downloaded the contents of that directory (as VERSION indicated that R-patched was 1.9.1 alpha), the links to the source files for the Recommended packages or not present (obviously). And make doesn't seem to work without these links. The rsync approach places the package sources *and* the links in the correct directory. Yep. I was being dense. Missed the symlink part of the process. My error. I also missed the venus transit this morning due to clouds... :-( So the instructions in the Admin manual are lacking a statement that you need to create links to each of the package sources in the following form name-of-package.tgz which links to name-of-package_version.tar.gz. As it stands, the instructions in the Installation Admin manual are not sufficient to get the manual download method to work. You need to run tools/link-recommended. I've added that to R-admin. Should Fritz also add that to the CRAN 'R Sources' page so that both locations are in synch procedurally? Procedurally, I think that the rsync approach is substantially easier (one step instead of multiple downloads) and certainly less error prone. Also the ./tools/rsync-recommended script is set up to pick up the proper package versions, which also helps to avoid conflicts. I agree - being a bit of a Linux newbie, I hadn't used rsync before. Seeing how easy it was to use this method of getting the required sources I will be using this method in future. rsync is great, *provided* you have permission to use the ports it uses. Users with http proxies often do not, hence the description of the manual method. During alpha/beta periods, we do make a complete tarball available, and I wonder if we should not be doing so with R-patched/R-devel at all times. Good point on rsync. Perhaps another option to consider/suggest (though it might complicate things) is to use wget. Since wget supports proxy servers, etc. and can use http, it might be an alternative for folks. The wget command syntax (assuming that your working dir is the main R source dir) would be: wget -r -l1 --no-parent -A*.gz -nd -P src/library/Recommended http://www.cran.mirrors.pair.com/src/contrib/1.9.1/Recommended The above _should_ be one one line, but of course will wrap here. There should be a space between the two lines. The above will copy the tar files (-A*.gz) from the server (-r -l1 --no-parent) to the appropriate 'Recommended' directory (-P), without recreating the source server's tree (-nd). One could refer the reader to 'man wget' or http://www.gnu.org/software/wget/wget.html for further information on how to use wget behind proxies and related issues. You would then of course run the ./tools/link-recommended script to create the symlinks, followed by ./configure and make. HTH, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error during make of R-patched on Fedora core 2
On Tue, 2004-06-08 at 12:40, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: wget -r -l1 --no-parent -A*.gz -nd -P src/library/Recommended http://www.cran.mirrors.pair.com/src/contrib/1.9.1/Recommended The above _should_ be one one line, but of course will wrap here. There should be a space between the two lines. Kids these days... Make that wget -r -l1 --no-parent -A*.gz -nd -P src/library/Recommended \ http://www.cran.mirrors.pair.com/src/contrib/1.9.1/Recommended LOL Thanks Dad ;-) Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fighting with ps.options and xlim/ylim
On Tue, 2004-06-08 at 20:18, ivo welch wrote: thank you, marc. I will play around with these parameters tomorrow at my real computer. yes, the idea is to just create an .eps and .pdf file, which is then \includegraphics[0.25\textwidth]{} in pdflatex. I need to tweak with the parameter ps.options(pointsize) because otherwise, I end up with 5pt fonts---which is not readable. And once I do this, I need different R parameter defaults on the axes. With the advice I have gotten, I think I am all set now. However, I am a little bit surprised that noone has written a package around this task---there must be many people that have to produce quarter-page (or half-page) graphics, and probably everyone is tweaking plot parameters a bit differently. It would be nice to build in some of this intelligence into plot parameters, themselves. of course, R is a free volunteer effort, and I am grateful for all the stuff that has been done already. /iaw You might want to try to set the 'height' and 'width' arguments for postscript() to something larger than the defaults. For example, use 6 x 6 (if square) and then use your code above to scale the plot down to size. That might help with your font size and spacing problem, rather than adjusting the point size. I don't have a 'rule of thumb', but experience suggests that downsizing a plot that is too big is better than upsizing one that is too small, especially for a partial page. I have done some other things using the 'seminar' LaTeX package for landscape orientation slides and there I generally use the exact size for the EPS files. But that is generally the only time that I do that. YMMV, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fighting with ps.options and xlim/ylim
On Tue, 2004-06-08 at 21:02, Duncan Murdoch wrote: On Tue, 08 Jun 2004 21:18:34 -0400, ivo welch [EMAIL PROTECTED] wrote: And once I do this, I need different R parameter defaults on the axes. With the advice I have gotten, I think I am all set now. However, I am a little bit surprised that noone has written a package around this task---there must be many people that have to produce quarter-page (or half-page) graphics, and probably everyone is tweaking plot parameters a bit differently. My general strategy for this is to change the width and height used in the pdf() or postscript() device call, then just trust the defaults chosen by R. For inclusion in a paper, I generally specify sizes about twice as big as I really want, and get text size similar to the printed text. So in your case, assuming a page is around 6 inches wide, I'd use something like pdf(width=3, height=3, ...) and then get LaTeX to shrink it to half the size. Duncan Murdoch I just got Duncan's msg, so I think that we are thinking along the same lines here. I agree with Duncan's suggestion relative to trying a 2x scaling factor and would see how that goes with your particular plot. Then adjust if need be as you develop some intuition. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: fighting with ps.options and xlim/ylim
On Wed, 2004-06-09 at 09:30, Uwe Ligges wrote: ivo welch wrote: Thanks again for all the messages. Is the 4% in par('usr') hardcoded? if so, may I suggest making this a user-changeable parameter for x and y axis? See ?par and its argumets xaxp, yaxp which can be set to i. Quick correction. That should be xaxs and yaxs. See my initial reply. xaxp and yaxp are for the positions of the tick marks. I looked at psfrag, and it seems like a great package. alas, I have switched to pdflatex, and pdffrag does not exist. :-( One option to point out, is that if the functionality in psfrag is important to you, you can use 'ps2pdf' to convert a ps file to a pdf file. ps2pdf filters the ps file through ghostscript to create the pdf file. It means a three step process (latex, dvips and ps2pdf), but it can provide additional functionality that pdflatex does not support, such as the use of \special as in the package 'pstricks'. pdf does not have any programming language functionality as does postscript, so there are some tradeoffs and likely why there is no pdffrag. Food for thought. I also discovered that there is a pdf device now. neat. Since R-1.3.0, as the News file tells us. Uwe Ligges HTH, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] displaying a table in full vs. typing variable name
On Thu, 2004-06-10 at 11:26, Uwe Ligges wrote: Rishi Ganti wrote: I have a data frame called totaldata that is 10,000 rows by about 9 columns. If about 9 equals 2, the behaviour reported below is expected. That is, of course, for sufficiently large values of about... ;-) Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] import SYSTAT .syd file?
On Wed, 2004-06-16 at 10:32, Anne York wrote: On Tue, 15 Jun 2004 Jonathan Baron [EMAIL PROTECTED] wrote: Does anyone know how to read a SYSTAT .syd file on Linux? (Splus 6 does it, but it is easier to find a Windows box with Systat than to download their demo. I'm wondering if there is a better way than either of these options.) Jon The commercial package dbmscopy has a Linux version. I have used dbmscopy for several years and have been happy with it as it converts data files among many spreadsheets and statistics programs. http://www.conceptual.com/dbmscopt.htm However, somewhat recently they were purchased by SAS, so I'm not sure of current state of the program. There are probably other commercial packages as well. Anne Hi Jon and Anne! One other commercial product to check out is Stat/Transfer. More information on supported formats is at: http://www.stattransfer.com/html/formats.html They do support Windows, MacOS and Unix/Linux. Demo downloads are available from: http://www.stattransfer.com/html/download.html Unix/Linux pricing is available at: http://www.stattransfer.com/html/prices_-_unix.html. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R help in Firefox on Windows XP
On Thu, 2004-06-17 at 12:06, Erich Neuwirth wrote: I had to reinstall my machine, so I installed Firefox 0.9 as browser I am using WinXP and R 1.9.1 beta. Now search in R html help does not work. I checked that the Java VM is working correctlt, Sun's test site says my installation is OK. Firefoxalso tells me that Applet Searchengine loaded Applet Searchengine started it just does not find anything. Does anybody know how to solve this? Erich Erich, Do you also have JavaScript enabled in the Firefox Tools - Options settings? Both Java and JavaScript need to be enabled for the help.start() search engine to function properly. I reviewed the release notes at http://www.mozilla.org/products/firefox/releases/ and did not see anything relating to Java there, as had been the case with prior releases. The message that you are getting on the status line suggests that the R search applet is being found and properly enabled, which is typically the primary source of problems. Check the above and let us know. Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Grouped AND stacked bar charts possible in R?
On Tue, 2004-06-22 at 10:54, Patrick Lenon wrote: Good day all, My statisticians want an R procedure that will produce grouped stacked barplots. Barplot will stack or group, but not both. The ftable function can produce a table of the exact form they want, but the barplot doesn't show all the divisions we want. For an example, here's the sample from the help file for ftable: data(Titanic) ftable(Titanic, row.vars = 1:3) ftable(Titanic, row.vars = 1:2, col.vars = Survived) ftable(Titanic, row.vars = 2:1, col.vars = Survived) Now take it a step further to try to add another dimension: b - ftable(Titanic, row.vars=1:3) Survived No Yes Class SexAge 1st Male Child0 5 Adult 118 57 Female Child0 1 Adult4 140 2nd Male Child0 11 Adult 154 14 Female Child0 13 Adult 13 80 3rd Male Child 35 13 Adult 387 75 Female Child 17 14 Adult 89 76 Crew Male Child0 0 Adult 670 192 Female Child0 0 Adult3 20 barplot(b) barplot(b, beside=T)) Neither resulting barplot is satisfactory. The first stacks all the subdivisions of Survived = Yes and Survived = No together. The second is closer because it creates two groups, but it lists combinations side-by-side that we'd like stacked. In the above example No and Yes would be stacked on bars labeled Male or Female in groups by Class. I've taken a look through the R-Help archives and looked through the contributed packages, but haven't found anything yet. If you have any thoughts how we might produce groups of stacked bars from an ftable, we would appreciate it. I think that you are trying to plot too much information in a single graphic. The result of a multi-dimensional barplot is likely to be very difficult to interpret visually. You would likely be better served to determine, within the multiple dimensions, what your conditioning and grouping dimensions need to be and then consider a lattice based plot. I would urge you to consider using either barchart() or perhaps dotplot() in lattice, which are designed to handle multivariable charts of this nature. Use: library(lattice) Then for general information ?Lattice and then ?barchart for more function specific information and examples of graphics with each function. For the Titanic data that you have above, you could do something like: # Convert the multi-dimensional table to a # data frame. Assumes you have already done # data(Titanic) MyData - as.data.frame(Titanic) # Take a look at the structure MyData ClassSex Age Survived Freq 11st Male Child No0 22nd Male Child No0 33rd Male Child No 35 4 Crew Male Child No0 51st Female Child No0 62nd Female Child No0 73rd Female Child No 17 8 Crew Female Child No0 91st Male Adult No 118 10 2nd Male Adult No 154 11 3rd Male Adult No 387 12 Crew Male Adult No 670 13 1st Female Adult No4 14 2nd Female Adult No 13 15 3rd Female Adult No 89 16 Crew Female Adult No3 17 1st Male Child Yes5 18 2nd Male Child Yes 11 19 3rd Male Child Yes 13 20 Crew Male Child Yes0 21 1st Female Child Yes1 22 2nd Female Child Yes 13 23 3rd Female Child Yes 14 24 Crew Female Child Yes0 25 1st Male Adult Yes 57 26 2nd Male Adult Yes 14 27 3rd Male Adult Yes 75 28 Crew Male Adult Yes 192 29 1st Female Adult Yes 140 30 2nd Female Adult Yes 80 31 3rd Female Adult Yes 76 32 Crew Female Adult Yes 20 # Now do a plot. Use 'library(lattice)' here first # if you had not already done so above for help. barchart(Freq ~ Survived | Age * Sex, groups = Class, data = MyData, auto.key = list(points = FALSE, rectangles = TRUE, space = right, title = Class, border = TRUE), xlab = Survived, ylim = c(0, 800)) The above barchart will create a four panel plot, where the four main panels will contain the combinations of Sex and Age. Within each panel will be two groups of bars, one each for the Survived Yes/No status. Within each group will be one bar for each Class. That is one quick way of grouping things, but you can alter that and other plot attributes easily. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Covered Labels
On Wed, 2004-06-23 at 09:06, Martina Renninger wrote: Dear All! How can I cope with overlapping or covered labels (covered by labels from other data points) in plots? Presuming that you are using text() to identify points in a plot, you can use the 'cex' argument (which defaults to 1) to reduce the size of the font. So in this case, try values 1, for example: text(x, y, labels = YourText, cex = 0.8) Possibly depending upon how many points you have, you can also adjust the position of the label with respect to the data points by using 'adj', 'pos' and 'offset'. See ?text for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] direction of axes of plot
On Sun, 2004-06-27 at 18:24, XIAO LIU wrote: R users: I want X-Y plotting with axes in reverse direction such as (0, -1, -2, -3, ). How can I do it? Thanks in advance Xiao If I am understanding what you want, the following should give you an example: # Create x and y with negative values x - -1:-10 y - -1:-10 # Show regular plot plot(x, y) # Now plot using -x and -y # Do not plot the axes or annotation plot(-x, -y, axes = FALSE, ann = FALSE) # Now label both x and y axes with negative # labels. Use pretty() to get standard tick mark locations # and use rev() to create tick mark labels in reverse order axis(1, at = pretty(-x), labels = rev(pretty(x))) axis(2, at = pretty(-y), labels = rev(pretty(y))) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] priceIts problem
On Thu, 2004-07-01 at 19:02, Erin Hodgess wrote: Dear R People: In library(its), there is a command priceIts. There is a problem with this command. It is returning an error message: ibm1 - priceIts(instrument=ibm,start=1998-01-01,quote=Open) Error in download.file(url, destfile, method = method, quiet = quiet) : cannot open URL `http://chart.yahoo.com/table.csv?s=ibma=0b=01c=1998d=5e=30f=2004g=dq=qy=0z=ibmx=.csv' In addition: Warning message: cannot open: HTTP status was `404 Not Found' This has been working fine until tonight. Has anyone else seen this, please? thanks in advance! It would appear that the URL at Yahoo has changed. If you try your URL in a browser, you get the same 404 msg. Going to the page for securing an IBM quote: http://finance.yahoo.com/q/hp?s=IBMa=00b=1c=1998d=05e=30f=2004g=d The URL towards the bottom of the page for the CSV download is: http://ichart.yahoo.com/table.csv?s=IBMa=00b=1c=1998d=05e=30f=2004g=dignore=.csv Note the 'ichart' as opposed to 'chart' in your error msg above. A quick review of the R source in the 'its' package suggests that the base URL for Yahoo is hard coded in the priceIts() function and the 'provider' argument is not yet used. I have copied Heywood Giles on this reply as an FYI and for confirmation. A short term workaround would be to edit the function's code using fix(priceIts) and change the base URL in the function body as indicated above. That seems to work for me with a quick check. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] priceIts problem
On Thu, 2004-07-01 at 19:26, Marc Schwartz wrote: On Thu, 2004-07-01 at 19:02, Erin Hodgess wrote: Dear R People: In library(its), there is a command priceIts. There is a problem with this command. It is returning an error message: ibm1 - priceIts(instrument=ibm,start=1998-01-01,quote=Open) Error in download.file(url, destfile, method = method, quiet = quiet) : cannot open URL `http://chart.yahoo.com/table.csv?s=ibma=0b=01c=1998d=5e=30f=2004g=dq=qy=0z=ibmx=.csv' In addition: Warning message: cannot open: HTTP status was `404 Not Found' This has been working fine until tonight. Has anyone else seen this, please? thanks in advance! snip I have copied Heywood Giles on this reply as an FYI and for confirmation. Apologies. That should be Giles Heywood. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Vertical text in plot
On Fri, 2004-07-02 at 12:45, Wolski wrote: Hallo! Would like to add vertical text labels to a histogram. Was trying with las but without sucess. I am using the standard histogram. This is what I was trying. hist(resS2$sam,breaks=seq(0,1,0.01),col=3,border=0,freq=F,add=T,xlim=c(0,1)) text(quantile(resS2$dif,0.005),5, 0.5% FP rate ,pos=2,cex=0.6,las=2) Thanks in advance. Eryk Hi Eryk! Try using 'srt' instead of 'las', which is for the axis labels. For example: text(quantile(resS2$dif, 0.005), 5, 0.5% FP rate, pos = 2, cex = 0.6, srt = 90) See ?par for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] plotting many line segments in different colors
On Fri, 2004-07-02 at 16:33, rif wrote: I want to plot a large number of line segments, with a color associated with each line segment (I'm actually plotting a function of the edges of a 2d graph, and I want to use color to indicate the level of the function.) I originally thought I could use lines, but lines puts all its lines in one color (from help(lines), col: color to use. This can be vector of length greater than one, but only the first value will be used.). Is there a function that does what I want? Right now I'm using the obvious solution of calling lines in a loop with a single segment, but this is really quite slow for my purposes, as I have several thousand lines total to plot. Take a look at ?matplot or ?matlines depending upon which one might make sense for your particular application. Both functions are on the same help page. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] counting the occurrences of vectors
On Sat, 2004-07-03 at 09:31, Ravi Varadhan wrote: Hi: I have two matrices, A and B, where A is n x k, and B is m x k, where n m k. Is there a computationally fast way to count the number of times each row (a k-vector) of B occurs in A? Thanks for any suggestions. Best, Ravi. How about something like this: row.match - function(m1, m2) { if (ncol(m1) != (ncol(m2))) stop(Matrices must have the same number of columns) m1.l - apply(m1, 1, list) m2.l - apply(m2 ,1, list) # return boolean for m1.l in m2.l m1.l %in% m2.l } Example of use: m - matrix(1:20, ncol = 4, byrow = TRUE) n - matrix(1:40, ncol = 4, byrow = TRUE) m [,1] [,2] [,3] [,4] [1,]1234 [2,]5678 [3,]9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20 n [,1] [,2] [,3] [,4] [1,]1234 [2,]5678 [3,]9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20 [6,] 21 22 23 24 [7,] 25 26 27 28 [8,] 29 30 31 32 [9,] 33 34 35 36 [10,] 37 38 39 40 row.match(n, m) [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE If you want to know which rows from n are matches: n[row.match(n, m), ] [,1] [,2] [,3] [,4] [1,]1234 [2,]5678 [3,]9 10 11 12 [4,] 13 14 15 16 [5,] 17 18 19 20 and if you just want the indices from n: which(row.match(n, m)) [1] 1 2 3 4 5 For timing, if I create some large matrices: m - matrix(1:2, ncol = 4, byrow = TRUE) nrow(m) [1] 5000 n - matrix(1:4, ncol = 4, byrow = TRUE) nrow(n) [1] 1 system.time(row.match(n, m)) [1] 0.39 0.01 0.41 0.00 0.00 length(row.match(n, m)) [1] 1 Does that get you what you want? HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Outliers
On Sun, 2004-07-04 at 19:41, Richard A. O'Keefe wrote: Last week there was a thread on outlier detection. I came across an article which has a very interesting paragraph. The article is Missing Values, Outliers, Robust Statistics, Non-parametric Methods by Shaun Burke, RHM Techology Ltd, High Wycombe, Buckinghamshire, UK. It was the fourth article in a series which appeared in Scientific Data Management in 1998 and 1998. The very interesting paragraph is this: NB: It should be noted that following a judgement in a US court, the Food and Drug Administration 9FDA) in a guide - Guide to inspection of pharmaceutical quality control laboratories - has specifically prohibited the use of outlier tests. Elsewhere, the article recommends the use of outlier tests as a way of locating possible transcription errors, but NOT as a way of discarding data. The FDA Guide referred to in that article is here: http://www.fda.gov/ora/inspect_ref/igs/pharm.html If you search that page using the keyword 'outlier' you will note several references. The part of the document relevant to the above citation is: In a recent court decision the judge used the term out-of-specification (OOS) laboratory result rather than the term product failure which is more common to FDA investigators and analysts. He ruled that an OOS result identified as a laboratory error by a failure investigation or an outlier test. The court provided explicit limitations on the use of outlier tests and these are discussed in a later segment of this document., or overcome by retesting. The court ruled on the use of retesting which is covered in a later segment of this document. is not a product failure. Some of the above and elsewhere in the document, relative to grammar and punctuation, suggests that the HTML page was converted from another format, perhaps Word or PDF. Some things do not quite make sense, but you can get the basic idea. Note also the use of the word 'limitation' above rather than 'prohibited'. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
On Mon, 2004-07-05 at 08:34, Christoph Hanck wrote: Dear experts, when trying to estimate an kernel density function with density(x) I get the following error message with imported data from either EXCEL or text files: Error in density(spr) : argument must be numeric. Other procedues such as truehist work. If I generate data within R density works fine. Does anybody have an idea? More than likely, your vector 'spr' was imported as a factor. This would possibly suggest that at least one value in 'spr' is not numeric. If the entire vector was numeric, this would not be a problem. It is also possible that you may have not specified the proper delimiting character during the import, which would compromise the parsed structure of the incoming data. Use: str(spr) and you will probably get Factor ... First, check to be sure that you have used the proper delimiting character during your import. See ?read.table for the family of related functions and the default argument values for 'sep', which is the delimiting character. You should also check your source data file, since it may be problematic. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function for skewness
On Mon, 2004-07-05 at 09:49, Ernesto Jardim wrote: Hi, Is there a function to estimate the skewness of a distribution ? Thanks EJ See skewness() in CRAN package 'e1071'. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] density(x)
On Mon, 2004-07-05 at 09:41, Christoph Hanck wrote: Hello and thanks for your reply Hopefully, my answer arrives at the correct place like that (if not, I am sorry for bothering you, but please let me know...) To sum up my procedure (sp is exactly the same thing as spr, I had just tinkered with the names while trying sth. to solve this problem) sp-read.table(c:/ratsdata/sp3.txt, col.names=sp) xd-density(sp) Error in density(sp) : argument must be numeric The suggested remedies yield the following str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 11 10 10 12 25 22 12 23 13 15 ... xd-density(as.numeric(sp)) Error in as.double.default(sp) : (list) object cannot be coerced to double Hence, it does not seem to be a factor. Declaring it as numeric gives another error message, on which I haven't yet found any help in Google/the archive. In this case, you are trying to pass a data frame as an argument to density() rather than a single column vector. The same problem is the reason for the error in xd-density(as.numeric(sp)). You are trying to coerce a data frame to a double. Example: # create a data frame called 'sp', that has a column called 'sp' sp - data.frame(sp = 1:195) str(sp) `data.frame': 195 obs. of 1 variable: $ sp: int 1 2 3 4 5 6 7 8 9 10 ... # Now try to use density() density(sp) Error in density(sp) : argument must be numeric # Now call density() properly with the column 'sp' as an argument # using the data.frame$column notation: density(sp$sp) Call: density(x = sp$sp) Data: sp$sp (195 obs.); Bandwidth 'bw' = 17.69 xy Min. :-52.08 Min. :7.688e-06 1st Qu.: 22.96 1st Qu.:1.009e-03 Median : 98.00 Median :4.600e-03 Mean : 98.00 Mean :3.328e-03 3rd Qu.:173.04 3rd Qu.:5.131e-03 Max. :248.08 Max. :5.133e-03 Two other options in this case: 1. Use attach() to place the data frame 'sp' in the current search path. Now you do not need to explicitly use the data.frame$column notation. Then detach is then used to clean up. attach(sp) density(sp) detach(sp) 2. Use with(), which is the preferred notation when dealing with data frames: with(sp, density(sp)) To avoid your own confusion in the future, it would be better to not name the data frame with the same name as a vector. It also helps when others may need to review your code. See ?with and ?attach for more information. Reading through An Introduction to R which is part of the default documentation set would be helpful to you in better understanding data types and dealing with data frame structures. I see that Prof. Ripley has also replied regarding the nature of truehist(), so that helps to clear up that mystery :-) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] counting the occurrences of vectors
31 c(3, 2, 3, 3, 1) c(1, 2, 2, 1, 2) c(1, 3, 2, 2, 2) c(1, 1, 1, 2, 3) 0102 I'd be curious to get any feedback on this and if someone has any thoughts on any gotchas with this approach. Thanks and I hope that this is of some help. Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] counting the occurrences of vectors
On Mon, 2004-07-05 at 23:22, Gabor Grothendieck wrote: Marc Schwartz MSchwartz at MedAnalytics.com writes: the likely overhead involved in paste()ing together the rows to create objects I thought I would check this and it seems that in my original f1 function its not really the paste itself that's the bottleneck but applying the paste. If we use do.call rather than apply, as shown in f1a below, then we see that f1a runs faster than row.match.count (which in turn was faster than f1): f1a - function(a,b,sep=:) { f - function(...) paste(..., sep=sep) a2 - do.call(f, as.data.frame(a)) b2 - do.call(f, as.data.frame(b)) c(table(c(b2,unique(a2)))[a2] - 1) } set.seed(1) # note that we have increased the size of the matrices from last post # to better show the speed difference a - matrix(sample(3,1,rep=T),nc=5) b - matrix(sample(3,1000,rep=T),nc=5) # row.match.count taken from Marc's post in this thread # have put a c(...) around row.match.count to make it comparable to f1a gc(); system.time(ans - c(row.match.count(b,a))) used (Mb) gc trigger (Mb) Ncells 436079 11.7 741108 19.8 Vcells 130663 1.0 786432 6.0 [1] 0.11 0.00 0.11 NA NA gc(); system.time(ansf1a - f1a(b,a)) used (Mb) gc trigger (Mb) Ncells 436080 11.7 741108 19.8 Vcells 130669 1.0 786432 6.0 [1] 0.04 0.00 0.04 NA NA all.equal(ansf1a,ans) [1] TRUE Gabor, Well done! I liked your approach in the prior message of getting away from using regex. I had one of those I could'a had a V-8 moments, when I realized that of course the resultant table names were syntactically correct R statements and therefore one could get away from worrying about the data type issues and use eval(parse(...)). The above approach is better yet, more flexible, of course more elegant and notably faster. Advantage Gabor... ;-) Best regards, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Improving effeciency - better table()?
On Tue, 2004-07-06 at 07:56, Simon Cullen wrote: Hi, I've been running some simulations for a while and the performance of R has been great. However, I've recently changed the code to perform a sort of chi-square goodness-of-fit test. To get the observed values for each cell I've been using table() - specifically I've been using cut2 from Hmisc to divide up the range into a specified number of cells and then using table to count how many observations appear in each cell. obs - table(cut2(z.trun, cuts=breaks)) Having done this I've found that the code takes much longer to run - up to 10x as long. Is there a more effecient way of doing this? Anyone have any thoughts? It would appear that you might be attempting to do a Hosmer-Lemeshow type of GOF test. If indeed that is the case, before making the above more efficient, you should spend some time reviewing the following posts by Frank Harrell on this subject: http://maths.newcastle.edu.au/~rking/R/help/02b/4210.html http://maths.newcastle.edu.au/~rking/R/help/02b/3111.html HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Converting S-Plus Libraries to R
On Tue, 2004-07-06 at 08:54, [EMAIL PROTECTED] wrote: Dear all! I'd like to do multiple imputation of missing values with s-plus libraries that are provided by Shafer (http://www.stat.psu.edu/~jls/misoftwa.html). I wonder, whether these libraries are compatible or somehow convertible to R (because I don't have S-plus), so that I can use this functions using the R Program. I would be happy if you could tell me, -if it is possible to use S-plus libraries with R -if yes, how I can use the S-Plus libraries in R Thank you very much, Will I believe that you will find that Prof. Ripley has already done the work for you in the 'mix' package on CRAN: http://cran.us.r-project.org/src/contrib/Descriptions/mix.html HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Creating Binary Outcomes from a continuous variable
On Wed, 2004-07-07 at 07:57, Doran, Harold wrote: Dear List: I have searched the archives and my R books and cannot find a method to transform a continuous variable into a binary variable. For example, I have test score data along a continuous scale. I want to create a new variable in my dataset that is 1=above a cutpoint (or passed the test) and 0=otherwise. My instinct tells me that this will require a combination of the transform command along with a conditional selection. Any help is much appreciated. Example: a - rnorm(20) b - ifelse(a 0, 0, 1) a [1] -1.0735800 -0.6788456 1.9979801 -0.4026760 0.1781791 -1.1540434 [7] -1.0842728 1.6042602 -0.7950492 -0.1194323 0.4450296 1.9269333 [13] -0.4456181 -0.8374677 -1.1898772 1.7353067 1.8619422 -0.1679996 [19] -0.2656138 -1.5529884 b [1] 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fast NA elimination ?
On Wed, 2004-07-07 at 09:35, ivo welch wrote: dear R wizards: an operation I execute often is the deletion of all observations (in a matrix or data set) that have at least one NA. (I now need this operation for kde2d, because its internal quantile call complains; could this be considered a buglet?) usually, my data sets are small enough for speed not to matter, and there I do not care whether my method is pretty inefficient (ok, I admit it: I use the sum() function and test whether the result is NA)---but now I have some bigger data sets. Is there a recommended method of doing NA elimination most efficiently? sincerely, /iaw --- ivo welch professor of finance and economics brown / nber / yale Take a look at ?complete.cases HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Importing an Excel file
On Wed, 2004-07-07 at 13:21, Park, Kyong H Mr. RDECOM wrote: Hello, R users, I am a very beginner of R and tried read.csv to import an excel file after saving an excel file as csv. But it added alternating rows of fictitious NA values after row number 16. When I applied read.delim, there were trailing several commas at the end of each row after row number 16 instead of NA values. Appreciate your help. Kyong Yep. This is one of the behaviors that I had seen with Excel when I was running Windows XP. Seemingly empty cells outside the data range would get exported in the CSV file causing a data integrity problem. It is one of the reasons that I installed OpenOffice under Windows and used Calc to open the Excel files and then do the CSV exports before I switched to Linux :-) Depending upon the version of Excel you are using, you might try to highlight and copy only the rectangular range of cells in the sheet that actually have data to a new sheet and then export the new sheet to a CSV file. Do not just click on the upper left hand corner of the sheet to highlight the entire sheet to copy it. Only highlight the range of cells you actually need for copying. Another option is to use the read.xls() function in the 'gregmisc' package on CRAN or install OpenOffice. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Importing an Excel file
On Wed, 2004-07-07 at 13:44, Marc Schwartz wrote: On Wed, 2004-07-07 at 13:21, Park, Kyong H Mr. RDECOM wrote: Hello, R users, I am a very beginner of R and tried read.csv to import an excel file after saving an excel file as csv. But it added alternating rows of fictitious NA values after row number 16. When I applied read.delim, there were trailing several commas at the end of each row after row number 16 instead of NA values. Appreciate your help. Kyong One other thing: The default delimiting characters in read.csv() and read.delim() are NOT the same. The former uses a comma and the latter a TAB character. If you did not change the defaults in Excel when you created your CSV file, that would account for the difference behaviors upon import. Be sure that the delimiting character in the R function you use properly corresponds to the actual delimiting character in your CSV file. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] text editor for R
On Wed, 2004-07-07 at 17:47, Yi-Xiong Sean Zhou wrote: Hi, What is the best text editor for programming in R? I am using JEdit as the text editor, however, it does not have anything specific for R. It will be nice to have a developing environment where the keywords are highlighted, plus some other debugging functions. Yi-Xiong More information is available at: http://www.sciviews.org/_rgui/ Your e-mail headers suggest that you are using Windows. Thus, perhaps the two best choices (subject to challenge by others) would be: 1. R-WinEdt (Under IDE/Script Editors) 2. ESS for Windows The above two tools provide for a wide variety of functionality beyond syntax highlighting. There is a syntax highlighting file listed at the above site for jEdit. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simple 'frequency' function?
On Fri, 2004-07-09 at 10:43, Dan Bolser wrote: On Fri, 9 Jul 2004, Uwe Ligges wrote: Dan Bolser wrote: Hi, I have designed the following function to extract count frequencies from an array of integers. For example... # Tipical array x - cbind(1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,22) # Define the frequency function frequency - function(x){ max - max(x) j - c() for(i in 1:max){ j[i] - length(x[x==i]) } return(j) } fre - frequency(x) plot(fre) How can I ... 1) Make this a general function so my array could be of the form # eats! x - cbind( egg,egg,egg,egg,ham,ham,ham,ham,chicken ) fre - frequency(x) plot(fre) 2) Make frequency return an object which I can call plot on (allowing the prob=TRUE option). See ?table: table(x) plot(table(x)) plot(table(x) / sum(table(x))) Sorry, why does plot(table(x),log='y') fail? I am looking at count/frequency distributions which are linear on log/log scales. Presumably you are getting the following: x - cbind( egg,egg,egg,egg,ham, ham,ham,ham,chicken ) plot(table(x),log='y') Error in plot.window(xlim, ylim, log, asp, ...) : Infinite axis extents [GEPretty(0,inf,5)] In addition: Warning message: Nonfinite axis limits [GScale(-inf,0.60206,2, .); log=1] The problem here is that the range for the default y axis is being set to limits that cannot be used on a log scale. If you review the code for plot.table(), which is the method that will be used here, you see the function definition as follows: graphics:::plot.table function (x, type = h, ylim = c(0, max(x)), lwd = 2, xlab = NULL, ylab = NULL, frame.plot = is.num, ...) Note that the default ylim is set to have a min value of 0, which of course you cannot have on a log scale. Thus, instead, use the following: plot(table(x), log = y, ylim = range(table(x))) or otherwise explicitly define the y axis range, such that the min value is 0. Note also that the default plot type here is 'h', which will result in a histogram type of plot using vertical lines. If you want a scatterplot type of graphic, use: plot(table(x), log = y, ylim = range(table(x)), type = p) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] where does R search when source()?
On Sat, 2004-07-10 at 20:13, Spencer Graves wrote: In case no one who knows has time to reply to this, I will report on my empirical investigation of this question using R 1.9.1 under Windows 2000. First, I saved a simple script file tst-source.R in the working directory, e.g., d:/sg/proj1. When I said, source('tst-source.R'), it sourced the file appropriately. Then I moved this file to the immediate parent, e.g., d:/sg and tried the same source command. It replied, Error ... unable to open connection ... . Then I got a command prompt, said, path, and moved the file into one of the directories in the search path. When I repeated the source command, it was still unable to open connection ... . Conclusion: From this and other experiences, I have found three ways to specify file names: (1) If the complete path and file name are supplied for an existing file, 'source' will find it. (2) If a file is in the working directory, specifying that name will get it. (3) If a file is in a subdirectory of the working directory, e.g., d:/sg/proj1/sub1/tst-source.R, then specifying source('sub1/tst-source.R') will get it. hope this helps. spencer graves Shin, Daehyok wrote: Exactly where does R search for foo.R if I type source(foo.R)? Only from current working directory (same as getwd()), from all directories specified by e.g. $PATH? Thanks. Daehyok Shin The relevant code snippet from source() is: Ne - length(exprs - parse(n = -1, file = file)) Note that the argument 'file' from the initial call to source() is used 'as is' in the 'file = file' argument to parse(). There is no searching of the $PATH. Thus, the file will be used based upon either the filename itself or a proper absolute or relative path as Spencer notes above. If the filename only is used, it needs to be in the current working directory or you get the error that Spencer experienced. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help with paste
On Mon, 2004-07-12 at 01:16, Andrew Criswell wrote: Hello All: Suppose the following little data frame: x - data.frame(dog = c(3,4,6,2,8), cat = c(8,2,3,6,1)) x$cat [1] 8 2 3 6 1 How can I get the paste() function to do the same thing. The command below is obviously wrong paste(x, cat, sep = $) You need to quote the x and the cat as explicit names, otherwise the objects 'x' and 'cat' are passed as arguments. 'x' in this case being your data frame and 'cat' being the function cat(). Try this: eval(parse(text = paste(x, cat, sep = $))) [1] 8 2 3 6 1 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] proportions confidence intervals
FWIW, if the exact intervals are what is desired here, as another poster has already suggested, binom.test() will get you there: binom.test(1, 10)$conf.int [1] 0.002528579 0.445016117 attr(,conf.level) [1] 0.95 binom.test(10, 100)$conf.int [1] 0.04900469 0.17622260 attr(,conf.level) [1] 0.95 HTH, Marc Schwartz On Mon, 2004-07-12 at 13:19, Chuck Cleland wrote: Darren also might consider binconf() in library(Hmisc). library(Hmisc) binconf(1, 10, method=all) PointEstLower Upper Exact 0.1 0.002528579 0.4450161 Wilson 0.1 0.005129329 0.4041500 Asymptotic 0.1 -0.085938510 0.2859385 binconf(10, 100, method=all) PointEst Lower Upper Exact 0.1 0.04900469 0.1762226 Wilson 0.1 0.05522914 0.1743657 Asymptotic 0.1 0.04120108 0.1587989 Spencer Graves wrote: Please see: Brown, Cai and DasGupta (2001) Statistical Science, 16: 101-133 and (2002) Annals of Statistics, 30: 160-2001 They show that the actual coverage probability of the standard approximate confidence intervals for a binomial proportion are quite poor, while the standard asymptotic theory applied to logits produces rather better answers. I would expect confint.glm in library(MASS) to give decent results, possibly the best available without a very careful study of this particular question. Consider the following: library(MASS)# needed for confint.glm library(boot)# needed for inv.logit DF10 - data.frame(y=.1, size=10) DF100 - data.frame(y=.1, size=100) fit10 - glm(y~1, family=binomial, data=DF10, weights=size) fit100 - glm(y~1, family=binomial, data=DF100, weights=size) inv.logit(coef(fit10)) (CI10 - confint(fit10)) (CI100 - confint(fit100)) inv.logit(CI10) inv.logit(CI100) In R 1.9.1, Windows 2000, I got the following: inv.logit(coef(fit10)) (Intercept) 0.1 (CI10 - confint(fit10)) Waiting for profiling to be done... 2.5 % 97.5 % -5.1122123 -0.5258854 (CI100 - confint(fit100)) Waiting for profiling to be done... 2.5 %97.5 % -2.915193 -1.594401 inv.logit(CI10) 2.5 % 97.5 % 0.005986688 0.371477058 inv.logit(CI100) 2.5 %97.5 % 0.0514076 0.1687655 (naiveCI10 - .1+c(-2, 2)*sqrt(.1*.9/10)) [1] -0.08973666 0.28973666 (naiveCI100 - .1+c(-2, 2)*sqrt(.1*.9/100)) [1] 0.04 0.16 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] paired t-test with bootstrap
On Tue, 2004-07-13 at 07:28, Petr Pikal wrote: Hi On 13 Jul 2004 at 12:28, luciana wrote: Dear Sirs, I am a R beginning user: by mean of R I would like to apply the bootstrap to my data in order to test cost differences between independent or paired samples of people affected by a certain disease. My problem is that even if I am reading the book by Efron (introduction to the bootstrap), looking at the examples in internet and available in R, learning a lot of theoretical things on bootstrap, I can't apply bootstrap with R to my data because of many doubts and difficulties. This is the reason why I have decided to ask the expert for help. I have a sample of diabetic people, matched (by age and sex) with a control sample. The variable I would like to compare is their drug and hospital monthly cost. The variable cost has a very far from gaussian distribution, but I need any way to compare the mean between the two group. So, in the specific case of a paired sample t-test, I aim at testing if the difference of cost is close to 0. What is the better way to follow for that? Another question is that sometimes I have missing data in my dataset (for example I have the cost for a patients but not for a control). If I introduce NA or a dot, R doesn't estimate the statistic I need (for instance the mean). To overcome this problem I have replaced the missing data with the mean computed with the remaining part of data. Anyway, I think R can actually compute the mean even with the presence of missing data. Is it right? What can I do? your.statistic(your.data, na.rm=T) e.g. mean(your.data, na.rm=T) or look at ?na.action e.g mean(na.omit(your.data)) Cheers Petr Pikal A couple of other thoughts here with respect to the use of a paired t-test for the comparison. As Luciana notes above, cost data is typically highly skewed, raising doubt as to the use of a simple parametric test to compare the two groups. One of the many reasons such data is skewed is that there are notable differences in the populations that are not accounted for when using simple characteristics for matching as is done here. What makes a patient an outlier with respect to cost and how does the distribution of these patients differ between the two groups and the individual pairs? For example, are all the patients in both groups insulin dependent or are some controlled with oral agents or diet alone? If all are using insulin, are some using self-administered injections while others are using implanted infusion pumps? What is the interval from disease onset? Have any had Pancreas/Islet Cell transplants? Do the matched patients have similar diabetic related sequelae such as diabetic retinopathy, neuropathy, vasculopathy, renal dysfunction and others? If not, the costs to treat these other issues, such as dialysis and wound care alone, can dramatically alter the cost profile for patients even when matched by age and gender. If you are not considering these issues (ie. such as inclusion/exclusion criteria), you risk significant challenges in your conclusions with respect to the comparison of costs for these two groups. I would raise similar concerns when using a sample mean as the imputed value for missing data. If you have not done so already, a Medline search of the literature would be in order to better understand what others have done in this area for diabetic treatment costs and the pros and cons of their respective approaches. I suspect that others here will have additional recommendations. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Permutations
On Tue, 2004-07-13 at 14:07, Jordi Altirriba Gutirrez wrote: Dear R users, Im a beginner user of R and Ive a problem with permutations that I dont know how to solve. Ive 12 elements in blocks of 3 elements and I want only to make permutations inter-blocks (no intra-blocks) (sorry if the terminology is not accurate), something similar to: 1 2 3 | 4 5 6 | 7 8 9 | 10 11 12 --1st permutation 1 3 2 | 4 5 6 | 7 8 9 | 10 11 12 NO - - 3 2 1 | 4 5 6 | 7 8 9 | 10 11 12 NO - - - 1 2 4 | 3 5 6 | 7 8 9 | 10 11 12 YES-2nd permutation -- 4 5 6 | 1 2 3 | 7 8 9 | 10 11 12 YES-3rd permutation - - - - - - 4 5 6 | 2 1 3 | 7 8 9 | 10 11 12 NO - - You can use the permutations() function in the 'gregmisc' package on CRAN: # Assuming you installed 'gregmisc' and used library(gregmisc) # First create 'groups' consisting of the four blocks groups - c(1 2 3, 4 5 6, 7 8 9, 10 11 12) # Now create a 4 column matrix containing the permutations # The call to permutations() here indicates the number of blocks in # groups (4), the required length of the output (4) and the vector of # elements to permute perms - matrix(permutations(4, 4, groups), ncol = 4) perms [,1] [,2] [,3] [,4] [1,] 1 2 310 11 12 4 5 67 8 9 [2,] 1 2 310 11 12 7 8 94 5 6 [3,] 1 2 34 5 610 11 12 7 8 9 [4,] 1 2 34 5 67 8 910 11 12 [5,] 1 2 37 8 910 11 12 4 5 6 [6,] 1 2 37 8 94 5 610 11 12 [7,] 10 11 12 1 2 34 5 67 8 9 [8,] 10 11 12 1 2 37 8 94 5 6 [9,] 10 11 12 4 5 61 2 37 8 9 [10,] 10 11 12 4 5 67 8 91 2 3 [11,] 10 11 12 7 8 91 2 34 5 6 [12,] 10 11 12 7 8 94 5 61 2 3 [13,] 4 5 61 2 310 11 12 7 8 9 [14,] 4 5 61 2 37 8 910 11 12 [15,] 4 5 610 11 12 1 2 37 8 9 [16,] 4 5 610 11 12 7 8 91 2 3 [17,] 4 5 67 8 91 2 310 11 12 [18,] 4 5 67 8 910 11 12 1 2 3 [19,] 7 8 91 2 310 11 12 4 5 6 [20,] 7 8 91 2 34 5 610 11 12 [21,] 7 8 910 11 12 1 2 34 5 6 [22,] 7 8 910 11 12 4 5 61 2 3 [23,] 7 8 94 5 61 2 310 11 12 [24,] 7 8 94 5 610 11 12 1 2 3 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Permutations
On Tue, 2004-07-13 at 14:29, Marc Schwartz wrote: On Tue, 2004-07-13 at 14:07, Jordi Altirriba Gutirrez wrote: Dear R users, Im a beginner user of R and Ive a problem with permutations that I dont know how to solve. Ive 12 elements in blocks of 3 elements and I want only to make permutations inter-blocks (no intra-blocks) (sorry if the terminology is not accurate), something similar to: 1 2 3 | 4 5 6 | 7 8 9 | 10 11 12 --1st permutation 1 3 2 | 4 5 6 | 7 8 9 | 10 11 12 NO - - 3 2 1 | 4 5 6 | 7 8 9 | 10 11 12 NO - - - 1 2 4 | 3 5 6 | 7 8 9 | 10 11 12 YES-2nd permutation -- 4 5 6 | 1 2 3 | 7 8 9 | 10 11 12 YES-3rd permutation - - - - - - 4 5 6 | 2 1 3 | 7 8 9 | 10 11 12 NO - - You can use the permutations() function in the 'gregmisc' package on CRAN: # Assuming you installed 'gregmisc' and used library(gregmisc) # First create 'groups' consisting of the four blocks groups - c(1 2 3, 4 5 6, 7 8 9, 10 11 12) # Now create a 4 column matrix containing the permutations # The call to permutations() here indicates the number of blocks in # groups (4), the required length of the output (4) and the vector of # elements to permute perms - matrix(permutations(4, 4, groups), ncol = 4) Ackone correction. The use of matrix() here was actually redundant. You can use: permutations(4, 4, groups) [,1] [,2] [,3] [,4] [1,] 1 2 310 11 12 4 5 67 8 9 [2,] 1 2 310 11 12 7 8 94 5 6 [3,] 1 2 34 5 610 11 12 7 8 9 [4,] 1 2 34 5 67 8 910 11 12 [5,] 1 2 37 8 910 11 12 4 5 6 [6,] 1 2 37 8 94 5 610 11 12 [7,] 10 11 12 1 2 34 5 67 8 9 [8,] 10 11 12 1 2 37 8 94 5 6 [9,] 10 11 12 4 5 61 2 37 8 9 [10,] 10 11 12 4 5 67 8 91 2 3 [11,] 10 11 12 7 8 91 2 34 5 6 [12,] 10 11 12 7 8 94 5 61 2 3 [13,] 4 5 61 2 310 11 12 7 8 9 [14,] 4 5 61 2 37 8 910 11 12 [15,] 4 5 610 11 12 1 2 37 8 9 [16,] 4 5 610 11 12 7 8 91 2 3 [17,] 4 5 67 8 91 2 310 11 12 [18,] 4 5 67 8 910 11 12 1 2 3 [19,] 7 8 91 2 310 11 12 4 5 6 [20,] 7 8 91 2 34 5 610 11 12 [21,] 7 8 910 11 12 1 2 34 5 6 [22,] 7 8 910 11 12 4 5 61 2 3 [23,] 7 8 94 5 61 2 310 11 12 [24,] 7 8 94 5 610 11 12 1 2 3 Sorry about that. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Permutations
On Tue, 2004-07-13 at 15:02, Rolf Turner wrote: Marc Schwartz wrote (in response to a question from Jordi Altirriba): snip This does not solve the problem that was posed. It only permutes the blocks, and does not allow for swapping between blocks. For instance it does produce the ``acceptable'' permutation 1 2 4 | 3 5 6 | 7 8 9 | 10 11 12 YES-2nd permutation I would guess that a correct solution is likely to be pretty difficult. I mean, one ***could*** just generate all 12! permutations of 1 to 12 and filter out the unacceptable ones. But this is getting unwieldy (12! is close to half a billion) and is inelegant. And the method does not ``generalize'' worth a damn. Rolf, You are correct. I missed that (not so subtle) change in the line above. I mis-read the inter-blocks (no intra-blocks) requirement as simply permuting the blocks, rather than allowing for the swapping of values between blocks. Time for new bi-focals... As Robert has also pointed out in his reply, this gets quite unwieldy. One of the follow up questions might be, is it only allowable that one value at a time can be swapped between blocks or can multiple values be swapped between blocks simultaneously? I am not sure that it makes a substantive impact on the problem or its solution, however. The question is what is to be done with the resultant set of permutations? FWIW, on a 3.2 Ghz P4 with 2Gb of RAM: system.time(perms - permutations(12, 12, 1:12)) Error: cannot allocate vector of size 1403325 Kb Timing stopped at: 2274.27 54.58 2787.76 0 0 Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Permutations
] 941.52 sd(unlist(lapply(r, nrow))) [1] 6.494079 There are likely to be some efficiencies in the function that can be brought to bear, but it is a start. In either case, the restricted permutations appear to be around 94%, if all of the assumptions are correct. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] MASS package?
On Wed, 2004-07-14 at 17:05, Johanna Hardin wrote: Did the MASS package disappear? Specifically, I'm looking for a function to find the MCD (robust measure of shape and location) for a multi-dimensional data matrix. Anyone know anything about this? Try: library(MASS) ?cov.rob It's there, unless you have a corrupted/incomplete installation. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Evaluating the Yield of Medical Tests
On Mon, 2004-07-19 at 14:37, Lisa Wang wrote: Hello, I'm a biostatistician in Toronto. I would like to know if there is anything in survival analysis developed in R for the method Evaluating the Yield of Medical Test (JAMA. May 14,1982--Vol 247, No.18 Frank E. Harrell, Jr,PhD; Robert M. Califf, MD; David B. Pryor, MD;Kerry L.Lee, PhD; Robert A. Rosait,MD.) Hope to hear from you and thanks I do not have access to the full text of Frank's article, however I read the brief abstract on Medline and cross-referenced the citation of the article with content in Frank's book (Regression Modeling Strategies - http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS). Thus, I am going to take a (hopefully well considered) guess that what you are looking for will be in the combination of the Hmisc and Design packages, which Frank has kindly made available for R. These are available for installation from CRAN. More information on Hmisc and Design is available at: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RS which is the link to Frank's site at Vanderbilt. Looking at the authors' names, Frank was at Duke when the cited article was written. I suspect that Frank will reply (RSN) with the acceptance or rejection of my guess, however... ;-) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] --max-vsize and --max-nsize linux?
On Tue, 2004-07-20 at 07:55, Christian Schulz wrote: Hi, somtimes i have trivial recodings like this: dim(tt) [1] 252382 98 system.time(for(i in 2:length(tt)){ tt[,i][is.na(tt[,i])] - 0 }) ...and a win2000(XP2000+,1GB) machine makes it in several minutes, but my linux notebook (XP2.6GHZ,512MB) don't get success after some hours. I recognize that the cpu load is most time relative small, but the hardisk have a lot of work. Is this a problem of --max-vsize and --max-nsize and i should play with that, because i can't believe that the difference of RAM is the reason? Have anybody experience what is an optimal setting with i.e. 512 MB RAM in Linux? Many thanks for help and comments regards,christian Christian, I am unclear as to the nature of your loop above. Note that: length(tt) [1] 24733436 which is 252382 * 98. Your looping approach is not efficient and incorrect. Note that when trying to run your loop 'as is', I get: system.time(for(i in 2:length(tt)){ + tt[,i][is.na(tt[,i])] - 0 + }) Error: subscript out of bounds Timing stopped at: 3.54 1.81 5.5 0 0 This is because 'i' eventually exceeds the number of columns (98) in 'tt', since you have 'i' going from 2 to 24733436. I am presuming that you simply want to set any 'NA' values in 'tt' to 0? Take note of using a vectorized approach: tt - matrix(sample(c(1:10, NA), 252382 * 98, replace = TRUE), ncol = 98) dim(tt) [1] 252382 98 table(is.na(tt)) FALSE TRUE 22484834 2248602 Now use: system.time(tt[is.na(tt)] - 0) [1] 1.56 0.73 2.42 0.00 0.00 table(is.na(tt)) FALSE 24733436 This is on a 3.2 Ghz system with 2 Gb of RAM. However, this is not a memory issue, it is an inefficient use of loops. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Precision in R
On Tue, 2004-07-20 at 12:13, Duncan Murdoch wrote: On Tue, 20 Jul 2004 11:55:32 -0400, [EMAIL PROTECTED] wrote : Does anyone know where I can find specifications for R's type double? As far as I know, all platforms use the IEEE-754 standard double precision numbers. Google will give you a description; here's one: http://research.microsoft.com/~hollasch/cgindex/coding/ieeefloat.html This isn't relevant to your question, but I found the history of the development of the standard interesting: http://http.cs.berkeley.edu/~wkahan/ieee754status/754story.html Duncan Murdoch Duncan, The standard is there, but not all applications stick to it faithfully. A good example being how certain cough spreadsheets \cough deal with numbers close to zero. For example, Excel will round numbers close to zero to zero. You may recall this thread from last year covered this topic http://maths.newcastle.edu.au/~rking/R/help/03a/6597.html More information on Excel's varied compliance with the IEEE 754 standard is available here http://support.microsoft.com/default.aspx?scid=kb;en-us;78113 The official IEEE 754 page is at http://grouper.ieee.org/groups/754/ and there are some good reading materials and FAQ's there. This above is beyond the scope of SAS in particular, but I suspect that the difference that Aaron is experiencing, as Andy has noted, is methodologic and not precision related. Aaron, one other source for information on the precision of R on your particular machine is the use of .Machine, which will provide you with a list of specifications. See ?.Machine for additional information here. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error: subscript out of bounds
On Tue, 2004-07-20 at 13:12, Marie-Pierre Sylvestre wrote: Hi I am running a simulation that involves a loop calling three 2 functions that I have written. Everything works fine when the inside of the loop is performed up to 1000 times (for (i in 1:750)). However, I sometimes get : ''Error: subscript out of bounds'' if I try to increase the loop 'size' to 1000. I am thinking it has to to with memory but I am not sure. I have increased my memory size to 512M but it does not solve my problem. It would take to much place to copy and paste my code here. It would be helpful if you could tell me whether my problem may or may not be related to memory size. Beside, what's the difference between Error: subscript out of bounds Error: subscript out of range ? Regards M-P Sylvestre If this was a memory error, you would probably get a cannot allocate ... type of error message. More than likely, the object upon which you are using the loop has dimensions which are smaller than the value(s) that your loops are using for indexing into the object. The use of either dim(object) or str(object) will give you more information here. When you increase the loop size, presumably, you have not increased the size of your underlying object in kind. For example, if your object (say a matrix) has dimensions of 500 rows and 10 columns, your loop is trying to index object[510, 12], which is 'out of bounds' for your object. A search of the R source code using grep suggests that the 'out of bounds' message is generally used when trying to index (subset) an object with a value or values that are not correct as I have above. This could also be a single dimension vector, BTW. For example, trying to index object[100] when your vector is only 50 elements in size. In the case of the 'out of range' message, that appears to be typically used when an argument to a function or other constrained parameter is above or below the valid range that the argument or parameter may have. A scan of where and how the messages are used indicates some variability, probably as a result of the multiple authors involved. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] dumpClass, hasSlot in R?
On Wed, 2004-07-21 at 15:53, hadley wickham wrote: There are a few notes about difference between the R implementation and the book at http://developer.r-project.org/methodsPackage.html I found the hardest thing to get to grips in R was method calling - using multiple dispatch (totally different to what I'm used to from Java, Python etc.). I found this tutorial (http://www.gwydiondylan.org/gdref/tutorial.html, the sections on generic functions and multiple-dispatch) very useful. However, it is for another programming language, and although the method and class creation process feels very similar to R, the syntax is quite different. There is definitely scope for a similarly structured introduction to S4 classes in R. Hadley I have not done any S4 coding yet, but two references that may be of interest are: Converting Packages to S4 by Doug Bates R News Vol 3, No. 1, June 2003 http://cran.r-project.org/doc/Rnews/Rnews_2003-1.pdf and S4 Classes and Methods by Fritz Leisch useR! 2004 Keynote Lecture Slides available at: http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Leisch.pdf HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] viewing Postscript file
On Thu, 2004-07-22 at 16:50, Bickel, David wrote: Is there any R function that can display a Postscript file that is already in the working directory? For example, if 'graph.ps' is such a file, I'd like to type something like this: plot.postscript.file(file = 'graph.ps') If no such function exists, I'd be interested in a way to use existing R functions to do this under UNIX or Windows, preferably without a system call to GhostView (gv). Thanks, David I am not entirely sure what your expectations are here. As you probably know, Postscript files (like PDF files) are text files that describe how to draw an image. It requires a Postscript interpreter (typically Ghostscript) to read the contents of the PS file and then something like GSview (or gv or ggv or ...) as a front end to render the image. It is illusory, but you could create a R wrapper function and call it plot.postcript.file(): plot.postscript.file - function(file = Rplots.ps) { # define viewer for UNIX/LINUX or Windows viewer - ifelse(.Platform$OS.type == unix, gv, GSview) system(paste(viewer, file, sep = )) } So: postscript(graph.ps) barplot(1:5) dev.off() plot.postscript.file(graph.ps) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] retrieve rows from frame assuming criterion
On Fri, 2004-07-23 at 08:36, Luis Rideau Cruz wrote: Hi all, I have a data frame in which one column(PUNTAR) is of character type. What I want is to retrieve is the frame but only with those rows matching elements of PUNTAR with a list characters (e.g c(IX49,IX48) ) YearTUR STODNR PUNTAR 1994 9412 94020061 IX49 1994 9412 94020062 IX48 1994 9412 94020063 X32 1994 9412 94020065 X23 1994 9412 94020066 X27 1994 9412 94020067 XI19 1994 9412 94020068 XI16 1994 9412 94020069 XI14 1994 9412 94020070 XI8 1994 9412 94020071 X25 1994 9412 94020072 X18 1994 9412 94020073 II23 1994 9412 94020074XII33 1994 9412 94020075XII31 my.function(frame) should be then equal to Year TURNR STODNR M_PUNTAR 1994 9412 94020061 IX49 1994 9412 94020062 IX48 Thank you in advance For a simple subset like this, something like the following, presuming that your data frame is called MyData: MyData[MyData$PUNTAR %in% c(IX49, IX48), ] Year TUR STODNR PUNTAR 1 1994 9412 94020061 IX49 2 1994 9412 94020062 IX48 This basically says to select only those rows where the value of MyData$PUNTAR is in c(IX49, IX48). If you need to engage in more complex boolean comparisons for subsetting, especially on multiple columns, then the function subset() would be better suited. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merge, cbind, or....?
On Fri, 2004-07-23 at 10:07, Bruno Cutayar wrote: Hi, i have two data.frame x and y like : x - data.frame( num = c(1:10), value = runif(10) ) y - data.frame( num = c(6:10), value = runif(5) ) and i want to obtain something like : num.xvalue.x num.y value.y 1 0.38423828NA 0.2911089 2 0.17402507NA 0.8455208 3 0.54443465NA 0.8782199 4 0.04540406NA 0.3202252 5 0.46052426NA 0.7560559 6 0.61385464 6 0.2911089 7 0.48274968 7 0.8455208 8 0.11961778 8 0.8782199 9 0.64531394 9 0.3202252 10 0.9205280510 0.7560559 with NA in case of missing value for y to x. { for this example : i write simply data.frame(num.x=c(1:10), value.x=x[[2]],num.y=c(rep(NA,5),6:10),value.y=y[[2]]) } I didn't find solution in merge(x,y,by=num) : missing rows are no keeping. Can't you help me ? thanks, Bruno The use of merge(), with the argument 'all' set to TRUE, will get you the following (note my values are different due to not using the same 'seed' value for runif() ): merge(x, y, by = num, all = TRUE) numvalue.x value.y 11 0.14057955NA 22 0.60850644NA 33 0.63410731NA 44 0.07196253NA 55 0.51869503NA 66 0.57042428 0.3340535 77 0.85874426 0.9340489 88 0.03608417 0.5417780 99 0.24422205 0.2214993 10 10 0.03383263 0.4947865 The use of 'all = TRUE' will fill in non-matching rows. The default is FALSE. Note here however, that the value.y column is not replicated for the first five rows, as you have above. If that is what you want, you could do something like the following: cbind(x, y$value) num value y$value 11 0.14057955 0.3340535 22 0.60850644 0.9340489 33 0.63410731 0.5417780 44 0.07196253 0.2214993 55 0.51869503 0.4947865 66 0.57042428 0.3340535 77 0.85874426 0.9340489 88 0.03608417 0.5417780 99 0.24422205 0.2214993 10 10 0.03383263 0.4947865 which takes advantage of the recycling of y$value, since it is shorter than the number of rows in 'x'. In this case, y$value is repeated twice. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] installing problems repeated.tgz linux
On Mon, 2004-07-26 at 09:34, Christian Schulz wrote: Hi, i try several possibilities adn looking in the archive, but didn't getting success to install j.lindsey's usefuel library repeated on my linux (suse9.0 with kernel 2.6.7,R.1.9.1) P.S. Windows, works fine Many thanks for help Christian [EMAIL PROTECTED]:/space/downs R CMD INSTALL - l /usr/lib/R/library repeated WARNING: invalid package '-' WARNING: invalid package 'l' WARNING: invalid package '/usr/lib/R/library' * Installing *source* package 'repeated' ... ** libs /usr/lib/R/share/make/shlib.mk:5: *** Target-Muster enthlt kein %. Schluss. ERROR: compilation failed for package 'repeated' ** Removing '/usr/lib/R/library/repeated' Christian, There is a space (' ') between the '-' and the 'l', which will be parsed as two separate arguments. Hence the initial WARNING messages. You need to use: R CMD INSTALL -l /usr/lib/R/library repeated Also note that you need to have 'root' privileges in order to install the packages into the /usr/lib/R tree. Thus, you should 'su' to root before running the command. You should verify that your R tree is in /usr/lib, as the default is /usr/local/lib, for which you would not require the '-l /usr/lib/R/library' argument. Presumably Windows worked fine because you typically do not require administrator privileges to install the package locally on Windows or your account has administrative privileges, which is typical (and bad) on Windows NT/XP. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] installing problems repeated.tgz linux
I echo Andy's experience on FC2. I was able to install the package here and got the same warning messages. Despite trying to use some web sites to translate the german text, I am unsure of the 'true' meaning. I think it is something pertaining to target patterns not being found, which leads me to think that this might be a locale/character encoding issue in the package. Anyone? Marc On Mon, 2004-07-26 at 14:16, Liaw, Andy wrote: I downloaded repeated.tgz and tried it myself on one of our AMD Opterons running SLES8, and it worked (R-1.9.1 compiled as 64-bit). Notice that I do get a couple of warnings from gcc about labels, and from g77 about the use of `sum' function. Andy SNIPPED From: Liaw, Andy Sorry, Christian. I have no idea what those error messages in German say. Andy From: [EMAIL PROTECTED] Hello, thanks for your and Marc's hint, but it seems not the probleme!? Is there any probleme with my make? many thanks and regards, christian [EMAIL PROTECTED]:/usr/lib/R R CMD INSTALL -l /usr/lib/R/library /space/downs/repeated.tgz * Installing *source* package 'repeated' ... ** libs /usr/lib/R/share/make/shlib.mk:5: *** Target-Muster enthlt kein %. Schluss. ERROR: compilation failed for package 'repeated' ** Removing '/usr/lib/R/library/repeated' [EMAIL PROTECTED]:/usr/lib/R R CMD INSTALL -l /usr/lib/R/library /space/downs/repeated * Installing *source* package 'repeated' ... ** libs /usr/lib/R/share/make/shlib.mk:5: *** Target-Muster enthlt kein %. Schluss. ERROR: compilation failed for package 'repeated' ** Removing '/usr/lib/R/library/repeated' SNIPPED __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ghyper package
On Tue, 2004-07-27 at 13:54, Romn Padilla Lizbeth wrote: Hello I am searching ghyper package (generalized hypergeometric distributions). Does anyone can send it to me? Regards from Mexico Lizbeth Romn You will find that _function_ in Bob Wheeler's SuppDists package on CRAN: http://cran.us.r-project.org/src/contrib/Descriptions/SuppDists.html So use: install.packages(SuppDists) library(SuppDists) ?ghyper HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RE: [S] tree function in R language
Shouldn't the URL be (for R 1.8.1 on Windows): http://cran.r-project.org/bin/windows/contrib/1.8/PACKAGES There is no URL as listed below, which is presumably why the error message. Was options()$CRAN changed improperly or is there some other Windows specific issue that is escaping me at the moment? BTW, you should upgrade to R 1.9.1, as you are two versions behind at this point. HTH, Marc Schwartz On Wed, 2004-07-28 at 23:08, Liaw, Andy wrote: 1. Could it be that your computer is behind a firewall? If so, try reading the R for Windows FAQ. 2. Please ask R-related question on R-help instead of S-news. Andy From: cheng wu Hi, Andy Thank you for your answer. Why I can't load CRAN packages? the error message is: {a - CRAN.packages() + install.packages(select.list(a[,1],,TRUE), .libPaths()[1], available=a)} trying URL `http://cran.r-project.org/bin/windows/contrib/PACKAGES' unable to connect to 'cran.r-project.org'. Error in download.file(url = paste(contriburl, PACKAGES, sep = /), : cannot open URL `http://cran.r-project.org/bin/windows/contrib/PACKAGES' From: Chushu Gu [EMAIL PROTECTED] To: cheng wu [EMAIL PROTECTED] Subject: Fw: [S] tree function in R language Date: Wed, 28 Jul 2004 09:14:48 -0400 - Original Message - From: Liaw, Andy [EMAIL PROTECTED] To: 'chushu Gu' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, July 27, 2004 11:22 PM Subject: RE: [S] tree function in R language Have you read the (latest edition of the) book for which the package you are using supports? There are differences in S-PLUS and R (and the 4th edition of MASS supports both, thus ought to tell you this particular difference between the two). tree() in S-PLUS is written originally by Clark and Pregibon. If you want that functionality in R, you need to load the `tree' package (available on CRAN), which is an independent implementation by one of the co-authors of MASS. Another hint: Look in the `scripts' subdirectory of where the `MASS' package is installed. Andy From: chushu Gu Hi all, I am using R 1.8.1, I have the following code, library(MASS) data(iris) ir.tr - tree(Species ~., iris) ir.tr summary(ir.tr) I got the following message: Error: couldn't find function tree I don't the reason, as I already load the library MASS. Could anyone tell me the possible reasons? Thanks, Chushu Gu __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Editing Strings in R
On Thu, 2004-07-29 at 15:56, Bulutoglu Dursun A Civ AFIT/ENC wrote: I was wondering if there is a way of editting strings in R. I have a set of strings and each set is a row of numbers and paranthesis. For example the first row is: (0 2)(3 4)(7 9)(5 9)(1 5) and I have a thousand or so such rows. I was wondering how I could get the corresponding string obtained by adding 1 to all the numbers in the string above. Dursun I don't know if this is the most efficient approach, but working on a few hours of sleep, here goes: NewRow - function(x) { TempRow - as.numeric(unlist(strsplit(x, ([\\(\\) ] + 1 TempMat - matrix(TempRow[!is.na(TempRow)], ncol = 2, byrow = TRUE) paste((, TempMat[, 1], , TempMat[, 2], ), sep = , collapse = ) } Basically, the first line splits the character vector into its components using (, ) and as regex based delimiters. It coerces the result to a numeric vector and adds 1. The second line takes the adjusted non-NA values and converts them into a two column matrix, to make it easier to do the paste in line 3. Line 3 returns the adjusted character vector reconstructed. So: MyRow - (0 2)(3 4)(7 9)(5 9)(1 5) NewRow(MyRow) [1] (1 3)(4 5)(8 10)(6 10)(2 6) So, if you have a bunch of these rows, you could use this function with apply: MyData - matrix(c((0 2)(3 4)(7 9)(5 9)(1 5), (1 6)(4 5)(3 7)(4 8)(9 0), (3 5)(8 1)(4 7)(2 7)(6 1))) MyData [,1] [1,] (0 2)(3 4)(7 9)(5 9)(1 5) [2,] (1 6)(4 5)(3 7)(4 8)(9 0) [3,] (3 5)(8 1)(4 7)(2 7)(6 1) matrix(apply(MyData, 1, NewRow)) [,1] [1,] (1 3)(4 5)(8 10)(6 10)(2 6) [2,] (2 7)(5 6)(4 8)(5 9)(10 1) [3,] (4 6)(9 2)(5 8)(3 8)(7 2) Somebody may come up with an approach that is more efficient I suspect. For 1,200 rows: system.time(apply((matrix(rep(MyData, 400))), 1, NewRow)) [1] 0.29 0.00 0.33 0.00 0.00 (Gabor? ;-) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Editing Strings in R
On Thu, 2004-07-29 at 21:08, Gabor Grothendieck wrote: Bulutoglu Dursun A Civ AFIT/ENC Dursun.Bulutoglu at afit.edu writes: I was wondering if there is a way of editting strings in R. I have a set of strings and each set is a row of numbers and paranthesis. For example the first row is: (0 2)(3 4)(7 9)(5 9)(1 5) and I have a thousand or so such rows. I was wondering how I could get the corresponding string obtained by adding 1 to all the numbers in the string above. First do the 1 character translations simultaneously using chartr and then use gsub for the remaining one to two character translation: gsub(0,10,chartr(0123456789,1234567890,(0 2)(3 4)(7 9)(5 9)(1 5))) Gabor, One problem: Multi-digit numbers in the source string: gsub(0,10,chartr(0123456789,1234567890, (10 99)(3 4)(7 9)(5 9)(1 5))) [1] (21 1010)(4 5)(8 10)(6 10)(2 6) Note the first number 10 gets transformed to 21 and the 99 goes to 1010. I made a quick update to NewRow, which is not faster, but gets it to two lines, instead of three, and is a bit cleaner: NewRow - function(x) { TempMat - matrix(as.numeric(unlist(strsplit(x, ([\\(\\) ], ncol = 3, byrow = TRUE) + 1 paste((, TempMat[, 2], , TempMat[, 3], ), sep = , collapse = ) } Note that with multi digit numbers, it gives a correct result: NewRow((10 99)(101 4)(7 9)(5 9)(1 5)) [1] (11 100)(102 5)(8 10)(6 10)(2 6) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Transparent backgrounds in png files
On Thu, 2004-07-29 at 19:24, Patrick Connolly wrote: On Thu, 29-Jul-2004 at 08:38AM +0100, Prof Brian Ripley wrote: | The bitmap() device does not support transparency. The png() device does. Unfortunately, though png() does a fine job at a transparent background, it's rather lumpy even on a screen. | | On Thu, 29 Jul 2004, Patrick Connolly wrote: | [...] | Mine is the reverse (and I'm using standard graphics, not Lattice). | I'm trying to get a transparent background but it always comes out | white. Setting bg = transparent, I've tried using a bitmap device | to create a png file. I've also tried creating a postscript file and | converting it to a PNG file using the Gimp. I've always used a | resolution of 300 dpi in bitmaps since the default is far too low. | | Really? You want PNG files of 2000+ pixels in each dimension? Well, 300 dpi is somewhat excessive for onscreen, but not for printing (more below). For a screen at 1600 by 1200 resolution, a bitmap of over 1000 pixels in either direction is not excessive. Using a screen rated at .25mm dot pitch, 75dpi is rather a lot less than sufficient. According to my calculations, .25mm dot pitch corresponds to over 100 dpi, and a .27mm screen is over 90 dpi, so I don't get this 72 business. Perhaps there's something I need to know. Evidently, there's something others know that I don't since png() generated files always turn out lumpy for me. It's worse than the unsatisfactory result of using PowerPoint's turning colours to transparent method I mentioned. People who are used to looking at TV screens might not think it's low resolution, so perhaps I'm too fussy. Maybe I should be more fussy about getting an exact ratio between the number of pixels in the plotting device and the size of the image in PowerPoint. I'm somewhat confused by the fact that PP scales to fit to the slide PNG files that I produce using the Gimp, but not ones made using the png() method directly. What is the essential difference? | -- and you should not really be using bitmapped files for other | uses.) Unfortunate as it may be, many people wish to put graphics in Word files and don't like being unable to see their graphics on their screen even if they have a postscript printer that could print them perfectly. That's where I use 300 dpi PNGs which print at least as well as WMFs I've seen. There was a recent discussion on this list about graphics using OSX which covers most of the same thinking. Nothing in that discussion indicated to me a better way to get graphic files from Linux to Word. If there are any, I'd like to know about them. Patrick, Are the Windows recipients of the R graphics involved in creating/editing the resultant documents, or do they simply require read only access of a final document? If the latter, then let me suggest that you generate EPS based graphics in R (for which you can specify height and width arguments in inches as required). Import those EPS graphics into OO.org's Impress (PP-alike) or Writer (Word-alike). Then print the file to a PS file and then use ps2pdf to create a PDF version of the document that the recipients can view in Acrobat Reader. If the former, as I believe Frank Harrell noted here some time back, the recent versions of Word and Powerpoint will create bitmapped previews of the EPS files upon import. While they are not a high quality image (and do add to filesize notably), they at least enable the users of the documents to preview the image for placement/sequencing. They can then print them to a PS file or if they have the purchased Adobe add-ins, could print them to a PDF file on their own for viewing in Acrobat. The major problem with bitmapped images (as has been mentioned here ad nauseum) is that they do not resize well and what you see on the screen does not always translate into a quality image when enlarged or sent to a printer. This is why vector based graphics (such as WMF/EMF, EPS, PDF and SVG) are preferred. Bitmapped image files also end up being quite large, whereas EPS files (since they are text files) are relatively small. It is not a solution today, but as SVG based graphics become more available on multiple platforms, that format will probably emerge as the preferred means of sharing such files. WMF/EMF are limited to Windows as a realistic option. There is the libEMF library available under Linux, but from personal experience, it is not a viable option. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to put multiple plots in the same window? (not par(mfrow=))
On Fri, 2004-07-30 at 10:41, F Duan wrote: Dear All, I am sorry if this question has been asked before. Below is my Question: I want to put several plots in the same window, but I dont want the blank space between plots (like par(mfrow=)) --- that makes the plots too small. Could anyone tell me how to do it? Thanks a lot. Frank It is not clear if you want a matrix of plots or if you want plots that actually overlap (ie. inset plots). For example, for a matrix using par(mfrow), the actual figure regions for each plot fill up the full plotting device: par(mfrow = c(2, 2)) plot(1:5) box(which = figure) plot(1:5) box(which = figure) plot(1:5) box(which = figure) plot(1:5) box(which = figure) Each of the four plots take up one quarter of the overall device. The outer four boxes represent the figure region for each of the four plots. Within each figure region is the plot region and the axes, labels, etc. for each individual plot. You can use par(mar) to reduce the amount of space between the plot region and the figure region. As an extreme example: par(mfrow = c(2, 2)) par(mar = c(0, 0, 0, 0)) plot(1:5) box(which = figure) plot(1:5) box(which = figure) plot(1:5) box(which = figure) plot(1:5) box(which = figure) In this case, you now would need to play around with the axis tick marks, labels, etc. Can you clarify which space you are referring to? Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Transparent backgrounds in png files
Patrick, Here is one additional option for you. I happened to be doing some searching on the OO.org site today for some printing related issues in their Bugzilla equivalent. There was a reference to a MS Office PDF import filter available from ScanSoft that would enable you to create PDF vector based plot files in R (using pdf()) and then import them into MS Office. There is a RFE in the OO.org issues list for this feature, which won't appear before OO.org V2.0. If and when this becomes available it would streamline some of the Linux - Windows issues that have been discussed in this thread. More information on PDFConverter is available from the ScanSoft site at: http://www.scansoft.com/pdfconverter/standard/ There is a standard version available for $49 (U.S) and a professional version available for $99 (U.S.). Some example PDF - Word documents are available at http://www.scansoft.com/pdfconverter/demo/. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pairwise difference operator
On Fri, 2004-07-30 at 18:30, Adaikalavan Ramasamy wrote: There was a BioConductor thread today where the poster wanted to find pairwise difference between columns of a matrix. I suggested the slow solution below, hoping that someone might suggest a faster and/or more elegant solution, but no other response. I tried unsuccessfully with the apply() family. Searching the mailing list was not very fruitful either. The closest I got to was a cryptic chunk of code in pairwise.table(). Since I do use something similar myself occasionally, I am hoping someone from the R-help list can suggest alternatives or past threads. Thank you. ### Code ### pairwise.difference - function(m){ npairs - choose( ncol(m), 2 ) results - matrix( NA, nc=npairs, nr=nrow(m) ) cnames - rep(NA, npairs) if(is.null(colnames(m))) colnames(m) - paste(col, 1:ncol(m), sep=) k - 1 for(i in 1:ncol(m)){ for(j in 1:ncol(m)){ if(j = i) next; results[ ,k] - m[ ,i] - m[ ,j] cnames[k]- paste(colnames(m)[ c(i, j) ], collapse=.vs.) k - k + 1 } } colnames(results) - cnames rownames(results) - rownames(m) return(results) } ### Example using a matrix with 5 gene/row and 4 columns ### mat - matrix( sample(1:20), nc=4 ) colnames(mat) - LETTERS[1:4]; rownames(mat) - paste( g, 1:5, sep=) mat A B C D g1 10 16 3 15 g2 18 5 12 19 g3 7 4 8 13 g4 14 2 6 11 g5 17 1 20 9 pairwise.difference(mat) A.vs.B A.vs.C A.vs.D B.vs.C B.vs.D C.vs.D g1 -6 7 -5 13 1-12 g2 13 6 -1 -7-14 -7 g3 3 -1 -6 -4 -9 -5 g4 12 8 3 -4 -9 -5 g5 16 -3 8-19 -8 11 How about this: I am taking advantage of the combinations() function in the 'gregmisc' package to define the pairwise column combinations based upon the input matrix colnames. Given that, perhaps Greg might want to add this function to the package if it holds up to scrutiny. Additional error checking would be required as I note below. pairwise.diffs - function(x) { if(is.null(colnames(x))) colnames(x) - 1:ncol(x) col.diffs - combinations(ncol(x), 2, colnames(x)) result - x[, col.diffs[, 1]] - x[, col.diffs[, 2]] colnames(result) - paste(col.diffs[, 1], .vs., col.diffs[, 2], sep = ) result } What I am essentially doing is creating the matrix 'col.diffs' to hold the combinations of the colnames in matrix 'x'. If 'x' does not have colnames, I set them to the column indices. Then in line 2, I do the pairwise subtractions. Line 3 simply sets up the colnames in the result as the combinations. Note that the subtractions, as you have above, are the first column minus the second column in the pairwise combinations. You would also want to check for an input matrix of 3 columns, since the 'result' in that case would be a vector, rather than a matrix. In that case, you could add code to coerce 'result' to a matrix, or simply not allow matrices with 3 columns. So, using your example matrix above (different seed value): mat - matrix(sample(1:20), nc=4) colnames(mat) - LETTERS[1:4] rownames(mat) - paste( g, 1:5, sep=) mat A B C D g1 1 17 13 10 g2 12 5 7 16 g3 2 19 6 14 g4 20 4 11 8 g5 3 15 18 9 pairwise.diffs(mat) A.vs.B A.vs.C A.vs.D B.vs.C B.vs.D C.vs.D g1-16-12 -9 4 7 3 g2 7 5 -4 -2-11 -9 g3-17 -4-12 13 5 -8 g4 16 9 12 -7 -4 3 g5-12-15 -6 -3 6 9 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pairwise difference operator
On Fri, 2004-07-30 at 20:28, Marc Schwartz wrote: On Fri, 2004-07-30 at 18:30, Adaikalavan Ramasamy wrote: There was a BioConductor thread today where the poster wanted to find pairwise difference between columns of a matrix. I suggested the slow solution below, hoping that someone might suggest a faster and/or more elegant solution, but no other response. I tried unsuccessfully with the apply() family. Searching the mailing list was not very fruitful either. The closest I got to was a cryptic chunk of code in pairwise.table(). Since I do use something similar myself occasionally, I am hoping someone from the R-help list can suggest alternatives or past threads. Thank you. snip In follow up to the posts on this last night, I created an updated version of my function (though I will point out that Gabor's is faster, as I will show below). I realized that using the combinations() function had a potential limitation, which is the limits of R's recursion depth, as Greg mentions in the help for the function. It will require an adjustment when the number of columns is about 45. Thus, I modified the creation of the column combinations as noted below. I also added some code to verify the input data type and to ensure that the resultant structures remain matrices in the case of an input matrix with ncol = 2, in which case, this function is of course, overkill. Thus: pairwise.diffs - function(x) { stopifnot(is.matrix(x)) # create column combination pairs prs - cbind(rep(1:ncol(x), each = ncol(x)), 1:ncol(x)) col.diffs - prs[prs[, 1] prs[, 2], , drop = FALSE] # do pairwise differences result - x[, col.diffs[, 1]] - x[, col.diffs[, 2], drop = FALSE] # set colnames if(is.null(colnames(x))) colnames(x) - 1:ncol(x) colnames(result) - paste(colnames(x)[col.diffs[, 1]], .vs., colnames(x)[col.diffs[, 2]], sep = ) result } Now to performance. I created a large 1,000 column matrix: mat - matrix(sample(100, 1, replace = TRUE), ncol = 1000) colnames(mat) - 1:1000 str(mat) int [1:10, 1:1000] 48 23 26 22 69 64 2 13 13 69 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:1000] 1 2 3 4 ... Timing: gc();system.time(m - pairwise.diffs(mat)) used (Mb) gc trigger (Mb) Ncells 1541241 41.23094291 82.7 Vcells 7139074 54.5 17257300 131.7 [1] 1.14 0.19 1.39 0.00 0.00 gc();system.time(g - do.call(cbind, sapply(2:ncol(mat), f, mat))) used (Mb) gc trigger (Mb) Ncells 1541241 41.23094291 82.7 Vcells 7139074 54.5 17257300 131.7 [1] 0.81 0.02 0.92 0.00 0.00 Comparisons: str(m) int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:499500] 1.vs.2 1.vs.3 1.vs.4 1.vs.5 ... str(g) int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:499500] 1-2 1-3 1-4 1-5 ... table(m == g) TRUE 4995000 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is k equivalent to k:k ?
On Mon, 2004-08-02 at 09:46, Georgi Boshnakov wrote: Hi, I wonder if the following (apparent) inconsistency is a bug or feature. Since scalars are simply vectors of length one I would think that a and a:a produce the same result. For example, identical(4.01,4.01:4.01) [1] TRUE However, identical(4,4:4) [1] FALSE and identical(4.0,4.0:4.0) [1] FALSE A closer look reveals that the colon operator produces objects of different class, e.g. class(4) [1] numeric class(4.0) [1] numeric but class(4:4) [1] integer class(4.0:4.0) [1] integer Georgi Boshnakov The : operator is the functional equivalent of seq(from=a, to=b). Note that the help for seq() indicates the following for the return value: The result is of mode integer if from is (numerically equal to an) integer and by is not specified. Thus, when using the : operator, you get integers as the returned value(s), which is what is happening in your final pair of examples. If you look at the final example under ?identical, you will see: identical(1, as.integer(1)) ## FALSE, stored as different types This is because the first 1 is a double by default. Thus, in the case of: identical(4, 4:4) the first 4 is of type double, while the 4:4 is of type single. Thus the result is FALSE. Now, on the other hand, try: typeof(seq(4, 4, by = 1)) [1] double You see that the result of the sequence is of type double. Hence: identical(4, seq(4, 4, by = 1)) [1] TRUE So to the question in your subject, no k (a double by default) is not the same as k:k (a integer by default). HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is k equivalent to k:k ?
On Mon, 2004-08-02 at 10:09, Marc Schwartz wrote: snip Thus, in the case of: identical(4, 4:4) the first 4 is of type double, while the 4:4 is of type single. Thus the result is FALSE. snip Correction. The above sentence should read: the first 4 is of type double, while the 4:4 is of type **INTEGER**. Thus the result is FALSE. Sorry about that. Need more coffee Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R packages install problems linux - X not found (WhiteBox EL 3)
On Sun, 2004-08-08 at 12:32, Douglas Bates wrote: Dr Mike Waters wrote: I am used to using R under Windows, but have done an install of 1.9.1 under WhiteBox linux 3 (based on RHEL 3). This all went without a hitch, along with most of the additional package installs. However, while trying to install car and rgl I hit a problem regarding the X environment not being found. As I was doing the install from a console *within* the X environment, this is obviously down to a missing environment variable or link. The X11 directories all seem to be in the usual places. I've checked as much as I can through the archives and googled around, but to no avail. Any help appreciated. Or a missing development package. In many Linux distributions the include files for X11 are in a separate package from the run-time libraries. I have never used WhiteBox Linux but I imagine that will be the case for that distribution too. Check to see if there is a package with a name like xlibs-dev or x-dev. Just to amplify on Doug's comments, the RPM in question should be something like: XFree86-devel-... where the ... is replaced the by version numbering schema. I am presuming that WhiteBox has not yet changed over to the use of X.org in place of XFree86 at this point. If it has, then the RPM would be something like: xorg-x11-devel-... An easy way to check for this would be to open a console window and use: rpm -q XFree86-devel in the first case or: rpm -q xorg-x11-devel in the second case. If nothing is returned by the command, then it would confirm that you are missing the requisite RPM. In the case of the RGL package, you might want to review this recent thread: https://www.stat.math.ethz.ch/pipermail/r-help/2004-August/thread.html which indicates some issues related to the same devel libraries, including the XFree86-Mesa-libGL (or xorg-x11-Mesa-libGL) and XFree86-Mesa-libGLU (or xorg-x11-Mesa-libGLU) RPMS. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R packages install problems linux - X not found (WhiteBox EL 3)
On Sun, 2004-08-08 at 12:53, Marc Schwartz wrote: In the case of the RGL package, you might want to review this recent thread: https://www.stat.math.ethz.ch/pipermail/r-help/2004-August/thread.html Correction on the above URL. I pasted the wrong one here. It should be: https://www.stat.math.ethz.ch/pipermail/r-help/2004-August/053994.html Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] manipulating strings
On Sun, 2004-08-08 at 13:58, Stephen Nyangoma wrote: Hi I have a called fil consisting of the following strings. fil [1] 102.2 639104.2 224105.1 1159 107.1 1148 108.1 1376 [6] 109.2 1092 111.2 1238 112.2 349113.1 1204 114.1 537 [11] 115.0 303116.1 490117.2 202118.1 1864 119.0 357 I want to get a data frame like TimeObs 102.2 639 104.2 224 105.1 1159 107.1 1148 108.1 1376 109.2 1092 111.2 1238 112.2 349 113.1 1204 114.1 537 etc Can anyone see an efficient way of doing this? Thanks. Stephen Try this: # Create strings MyStrings - c( 102.2 639, 104.2 224, 105.1 1159, 107.1 1148, 108.1 1376, 109.2 1092, 111.2 1238, 112.2 349, 113.1 1204, 114.1 537, 115.0 303, 116.1 490, 117.2 202, 118.1 1864, 119.0 357) MyStrings [1] 102.2 639 104.2 224 105.1 1159 107.1 1148 [5] 108.1 1376 109.2 1092 111.2 1238 112.2 349 [9] 113.1 1204 114.1 537 115.0 303 116.1 490 [13] 117.2 202 118.1 1864 119.0 357 # Now convert to a data frame, by first using strsplit(), to break up # each of the vector elements into three components, using as a # split character. This returns a list, which we then convert to vector, # using unlist(). Then use matrix() to convert the vector into a two # dimensional object with 3 cols. Use 'byrow = TRUE' so that we fill # the matrix row by row. Then take only the second and third columns # from the matrix and convert them into a data frame. df - as.data.frame(matrix(unlist(strsplit(MyStrings, split = )), ncol = 3, byrow = TRUE)[, 2:3]) # Finally, set the colnames colnames(df) - c(Time, Obs) df Time Obs 1 102.2 639 2 104.2 224 3 105.1 1159 4 107.1 1148 5 108.1 1376 6 109.2 1092 7 111.2 1238 8 112.2 349 9 113.1 1204 10 114.1 537 11 115.0 303 12 116.1 490 13 117.2 202 14 118.1 1864 15 119.0 357 Note that the above presumes that your strings (character vectors) have a leading in them and the Time and Obs elements are also separated by a in each. See ?strsplit for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R packages install problems linux - X not found (WhiteBoxEL 3)
On Sun, 2004-08-08 at 14:10, Dr Mike Waters wrote: snip Thanks for the responses guys. I used to have RH9 installed on this machine and I found out about the separate developer packages then. I thought that I had got the relevant XFree devel package installed, but although it showed up in the rpm database as being present, the required files were not present. I did a forced rpm upgrade from the WhiteBox updates directory and that problem is now fixed, at least for car. Marc, thanks for the pointer on the rgl problem. However, I have a slightly different problem with the install of this package. It gets through to the point where it tries to make the rgl.so from the various .o files and fails then, as follows: g++ -I/usr/lib/R/include -I/usr/X11R6/include -DHAVE_PNG_H -I/usr/include -I/usr/local/include -Wall -pedantic -fno-exceptions -fno-rtti -fPIC -O2 -g -march=i386 -mcpu=i686 -c glgui.cpp -o glgui.o g++ -L/usr/local/lib -o rgl.so x11lib.o x11gui.o types.o math.o fps.o pixmap.o gui.o api.o device.o devicemanager.o rglview.o scene.o glgui.o -L/usr/X11R6/lib -L/usr/lib -lstdc++ -lX11 -lXext -lGL -lGLU -lpng /usr/lib/gcc-lib/i386-redhat-linux/3.2.3/../../../crt1.o(.text+0x18): In function `_start': : undefined reference to `main' x11lib.o(.text+0x84): In function `set_R_handler': /tmp/R.INSTALL.13414/rgl/src/x11gui.h:33: undefined reference to `R_InputHandlers' x11lib.o(.text+0x92):/tmp/R.INSTALL.13414/rgl/src/x11gui.h:33: undefined reference to `addInputHandler' x11lib.o(.text+0xfb): In function `unset_R_handler': /tmp/R.INSTALL.13414/rgl/src/x11lib.cpp:52: undefined reference to `R_InputHandlers' x11lib.o(.text+0x103):/tmp/R.INSTALL.13414/rgl/src/x11lib.cpp:52: undefined reference to `removeInputHandler' collect2: ld returned 1 exit status make: *** [rgl.so] Error 1 ERROR: compilation failed for package 'rgl' ** Removing '/usr/lib/R/library/rgl' - No doubt another failed dependency... DOH! Regards I am concerned by your indications of previously having had RH9 on the same box and that you had to force an update of the XFree Devel RPM. Forcing the installation of an RPM is almost always a bad thing. When you installed WB on the system, did you do a clean installation or some type of upgrade? If the latter, it is reasonable to consider that there may be some level of mixing and matching of RPMS from the two distributions going on. This could result in a level of marginally or wholly incompatible versions of RPMS being installed. Could you clarify that point? Also, be sure that you have the same versions of the XFree series RPMS installed. Use: rpm -qa | grep XFree in a console and be sure that the RPMS return the same version schema. If not, it is possible that one of your problems is the mixing of versions. Take note of the output of the above and be sure that the XFree86-Mesa-libGL and XFree86-Mesa-libGLU RPMS are installed as well. Some of the messages above would also suggest a problem finding R related headers. How did you install R? This may be a red herring of sorts, given the other problems, but may be helpful. Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R packages install problems linux - X not found (WhiteBoxEL 3)
On Mon, 2004-08-09 at 08:13, Dr Mike Waters wrote: snip Marc, Sorry for the confusion yesterday - in my defence, it was very hot and humid here in Hampshire (31 Celsius at 15:00hrs and still 25 at 20:00hrs). What had happened was that I had done a clean install of WB Linux, including the XFree86 and other developer packages. However, the on-line updating system updated the XFree86 packages to a newer sub version. It seems that it didn't do this correctly for the XFree86 developer package, which was missing vital files. However it showed up in the rpm database as being installed (i.e. rpm -qa | grep XFree showed it thus). I downloaded another rpm for this manually and I only forced the upgrade because it was the same version as already 'installed' (as far as the rpm database was concerned). I assumed that all dependencies were sorted out through the install in the first place. OK, that helps. I still have a lingering concern that, given the facts above, there may be other integrity issues in the RPM database, if not elsewhere. From reading the WB web site FAQ's (http://www.whiteboxlinux.org/faq.html) , it appears that they are using up2date/yum for system updates. Depending upon the version in use, there have been issues especially with up2date (hangs, incomplete updates, etc.) which could result in other problems. I use yum via the console here (under FC2), though I note that a GUI version of yum has been created, including replacing the RHN/up2date system tray alert icon. A thought relative to this specifically: If there is or may be an integrity problem related to the rpm database, you should review the information here: http://www.rpm.org/hintskinks/repairdb/ which provides instructions on repairing the database. Note the important caveats regarding backups, etc. The two key steps there are to remove any residual lock files using (as root): rm -f /var/lib/rpm/__* and then rebuilding the rpm database using (also as root): rpm -vv --rebuilddb I think that there needs to be some level of comfort that this basic foundation for the system is intact and correct. I only mentioned RH9 to show that I had some familiarity with the RedHat policy of separating out the 'includes' etc into a separate developer package. Once all this had been sorted out, I was then left with a compilation error which pointed to a missing dependency or similar, which was not due to missing developer packages, but, as you and Prof Ripley correctly point out, from the R installation itself. Having grown fat and lazy on using R under the MS Windows environment, I was struggling to identify the precise nature of this remaining problem. As regards the R installation, I did this from the RH9 binary for version 1.9.1, as I did not think that the Fedora Core 2 binary would be appropriate here. Perhaps I should now compile from the source instead? I would not use the FC2 RPM, since FC2 has many underlying changes not the least of which includes the use of the 2.6 kernel series and the change from XFree86 to x.org. Both changes resulted in significant havoc during the FC2 testing phases and there was at least one issue here with R due to the change in X. According to the WB FAQs: If you cannot find a package built specifically for RHEL3 or WBEL3 you can try a package for RH9 since many of the packages in RHEL3 are the exact same packages as appeared in RH9. Thus, it would seem reasonable to use the RH9 RPM that Martyn has created. An alternative would certainly be to compile R from the source tarball. In either case, I would remove the current installation of R and after achieving a level of comfort that your RPM database is OK, reinstall R using one of the above methods. Pay close attention to any output during the installation process, noting any error or warning messages that may occur. If you go the RPM route, be sure that the MD5SUM of the RPM file matches the value that Martyn has listed on CRAN to ensure that the file has been downloaded in an intact fashion. These are my thoughts at this point. You need to get to a point where the underlying system is stable and intact, then get R to the same state before attempting to install new packages. HTH, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] R packages install problems linux - X not found (WhiteBoxEL 3)
On Tue, 2004-08-10 at 08:15, Dr Mike Waters wrote: snip From unpacking the tarball and running ./configure in the R source directory, I obtain the fact that crti.o is needed by ld.so and was not found. This file is not present on the system. This file, along with crtn.o is usually installed by the gnu libc packages, I believe. However, I know that not all *nix distributions include these files among their packages. From a web search, I have not been able to ascertain whether this lack of a crti.o is due to there not being one in the distribution, or to another incomplete package install. So, I did a completely fresh installation of WhiteBox, followed by R built from source, checked that it ran and then installed the R packages. Only then did I run up2date. At least crti.o and crtn.o are still there this time, along with the XFree86 includes. A bit of a cautionary tale, all in all. Thanks for all the help and support. Regards M Mike, From my FC2 system: $ rpm -qf /usr/lib/crti.o glibc-devel-2.3.3-27 $ rpm -qf /usr/lib/crtn.o glibc-devel-2.3.3-27 So, you are correct relative to the source of these two files. A follow up question might be, did you include the devel packages during your initial install? If not, that would explain the lack of these files. if you did, then it would add another data point to support the notion that your system was, to some level, compromised and a clean install was probably needed, rather than just trying to re-create the RPM database. Glad that you are up and running at this point. Given Martyn's follow up messages, it looks like there may be an issue with the RH9 RPM, so for the time being using the source tarball would be appropriate. Best regards, Marc __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and names.arg
On Fri, 2004-08-13 at 09:22, Luis Rideau Cruz wrote: R-help Is there any option to get closer the x-axis and names.arg from barplot? Thank you Using mtext() you can do something like the following: data(VADeaths) # Now place labels closer to the x axis # set 'axisnames' to FALSE so the default # labels are not drawn. Also note that barplot() # returns the bar midpoints, so set 'mp' to the return # values mp - barplot(VADeaths, axisnames = FALSE) # Now use mtext() for the axis labels mtext(text = colnames(VADeaths), side = 1, at = mp, line = 0) # clean up rm(VADeaths) You can adjust the 'line = 0' argument to move the labels closer to and farther away from the axis. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Question from Newbie on PostScript and Multiple Plots
On Fri, 2004-08-13 at 11:28, Johnson, Heather wrote: Hi, As I'm pretty new to R I hope this question isn't too basic. I am currently looping through my dataset and for each iteration am producing three separate plots. When I output these plots to the screen they are nicely grouped as three plots per page, however, when I try to send it to a PostScript file I get one page for each plot. I have adjusted my postscript options so that my plots are the size that I want and the paper is set to portrait, I just can't figure out how to get all three plots on one page in the postscript file. I've been through the archives on the list (albeit not exhaustively) and the manuals available on the R site and cannot figure out how to solve my problem. Thanks, -Heather Either one of the following work for me: # Do 3 plots in a 2 x 2 matrix postscript(file = ThreePlots.ps, horizontal = FALSE) par(mfrow = c(2, 2)) plot(1:5) barplot(1:5) boxplot(rnorm(10)) dev.off() # Do 3 x 1 postscript(file = ThreePlots.ps, horizontal = FALSE) par(mfrow = c(3, 1)) plot(1:5) barplot(1:5) boxplot(rnorm(10)) dev.off() Can you provide an example of the code that you are using? Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] numerical accuracy, dumb question
Part of that decision may depend upon how big the dataset is and what is intended to be done with the ID's: object.size(1011001001001) [1] 36 object.size(1011001001001) [1] 52 object.size(factor(1011001001001)) [1] 244 They will by default, as Andy indicates, be read and stored as doubles. They are too large for integers, at least on my system: .Machine$integer.max [1] 2147483647 Converting to a character might make sense, with only a minimal memory penalty. However, using a factor results in a notable memory penalty, if the attributes of a factor are not needed. If any mathematical operations are to be performed with the ID's then leaving them as doubles makes most sense. Dan, more information on the numerical characteristics of your system can be found by using: .Machine See ?.Machine and ?object.size for more information. HTH, Marc Schwartz On Fri, 2004-08-13 at 21:02, Liaw, Andy wrote: If I'm not mistaken, numerics are read in as doubles, so that shouldn't be a problem. However, I'd try using factor or character. Andy From: Dan Bolser I store an id as a big number, could this be a problem? Should I convert to at string when I use read.table(... example id's 1001001001001 1001001001002 ... 1002001002005 Bigest is probably 1011001001001 Ta, Dan. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] numerical accuracy, dumb question
On Sat, 2004-08-14 at 08:42, Tony Plate wrote: At Friday 08:41 PM 8/13/2004, Marc Schwartz wrote: Part of that decision may depend upon how big the dataset is and what is intended to be done with the ID's: object.size(1011001001001) [1] 36 object.size(1011001001001) [1] 52 object.size(factor(1011001001001)) [1] 244 They will by default, as Andy indicates, be read and stored as doubles. They are too large for integers, at least on my system: .Machine$integer.max [1] 2147483647 Converting to a character might make sense, with only a minimal memory penalty. However, using a factor results in a notable memory penalty, if the attributes of a factor are not needed. That depends on how long the vectors are. The memory overhead for factors is per vector, with only 4 bytes used for each additional element (if the level already appears). The memory overhead for character data is per element -- there is no amortization for repeated values. object.size(factor(1011001001001)) [1] 244 object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1))) [1] 308 # bytes per element in factor, for length 4: object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),1)))/4 [1] 77 # bytes per element in factor, for length 1000: object.size(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250)))/1000 [1] 4.292 # bytes per element in character data, for length 1000: object.size(as.character(factor(rep(c(1011001001001,111001001001,001001001001,011001001001),250/1000 [1] 20.028 So, for long vectors with relatively few different values, storage as factors is far more memory efficient (this is because the character data is stored only once per level, and each element is stored as a 4-byte integer). (The above was done on Windows 2000). -- Tony Plate Good point Tony. I was making the, perhaps incorrect assumption, that the ID's were unique or relatively so. However, as it turns out, even that assumption is relevant only to a certain extent with respect to how much memory is required. What is interesting (and presumably I need to do some more reading on how R stores objects internally) is that the incremental amount of memory is not consistent on a per element basis for a given object, though there is a pattern. It is also dependent upon the size of the new elements to be added, as I note at the bottom. This all of course presumes that object.size() is giving a reasonable approximation of the amount of memory actually allocated to an object, for which the notes in ?object.size raise at least some doubt. This is a critical assumption for the data below, which is on FC2 on a P4. For example: object.size(a) [1] 44 object.size(letters) [1] 340 In the second case, as Tony has noted, the size of letters (a character vector) is not 26 * 44. Now note: object.size(c(a, b)) [1] 52 object.size(c(a, b, c)) [1] 68 object.size(c(a, b, c, d)) [1] 76 object.size(c(a, b, c, d, e)) [1] 92 The incremental sizes are a sequence of 8 and 16. Now for a factor: object.size(factor(a)) [1] 236 object.size(factor(c(a, b))) [1] 244 object.size(factor(c(a, b, c))) [1] 268 object.size(factor(c(a, b, c, d))) [1] 276 object.size(factor(c(a, b, c, d, e))) [1] 300 The incremental sizes are a sequence of 8 and 24. Using elements along the lines of Dan's: object.size(1) [1] 52 object.size(c(1, 10001)) [1] 68 object.size(c(1, 10001, 10002)) [1] 92 object.size(c(1, 10001, 10002, 10003)) [1] 108 object.size(c(1, 10001, 10002, 10003, 10004)) [1] 132 The sequence is 16 and 24. For factors: object.size(factor(1) [1] 244 object.size(factor(c(1, 10001))) [1] 260 object.size(factor(c(1, 10001, 10002))) [1] 292 object.size(factor(c(1, 10001, 10002, 10003))) [1] 308 object.size(factor(c(1, 10001, 10002, 10003, 10004))) [1] 340 The sequence is 24 and 32. So, the incremental size seems to alternate as elements are added. The behavior above would perhaps suggest that memory is allocated to objects to enable pairs of elements to be added. When the second element of the pair is added, only a minimal incremental amount of additional memory (and presumably time) is required. However, when I add a third element, there is additional memory required to store that new element because the object needs to be adjusted in a more fundamental way to handle this new element. There also appears to be some memory allocation adjustment at play here. Note: object.size(factor(1)) [1
RE: [R] numerical accuracy, dumb question
On Sat, 2004-08-14 at 12:01, Marc Schwartz wrote: There also appears to be some memory allocation adjustment at play here. Note: object.size(factor(1)) [1] 244 object.size(factor(1, a)) [1] 236 Arggh. Negate that last comment. I had a typo in the second example. It should be: object.size(factor(c(1, a))) [1] 252 which of course results in an increase in memory. Geez. Time for lunch. Marc __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] numerical accuracy, dumb question
On Sat, 2004-08-14 at 13:19, Prof Brian Ripley wrote: On Sat, 14 Aug 2004, Marc Schwartz wrote: object.size(a) [1] 44 object.size(letters) [1] 340 In the second case, as Tony has noted, the size of letters (a character vector) is not 26 * 44. Of course not. Both are character vectors, so have the overhead of any R object plus an allocation for pointers to the elements plus an amount for each element of the vector (see the end). These calculations differ on 32-bit and 64-bit machines. For a 32-bit machine storage is in units of either 28 bytes (Ncells) or 8 bytes (Vcells) so single-letter characters are wasteful, viz object.size(aaa) [1] 44 That is 1 Ncell and 2 Vcells, 1 for the string (7 bytes plus terminator) and 1 for the pointer. Whereas object.size(letters) [1] 340 has 1 Ncell and 39 Vcells, 26 for the strings and 13 for the pointers (which fit two to a Vcell). Note that repeated character strings may share storage, so for example object.size(rep(a, 26)) [1] 340 is wrong (140, I think). And that makes comparisons with factors depend on exactly how they were created, for a character vector there probably is a lot of sharing. I have a feeling that these calculations are off for character vectors, as each element is a CHARSXP and so may have an Ncell not accounted for by object.size. (`May' because of potential sharing.) Would anyone who is sure like to confirm or deny this? It ought to be possible to improve the estimates for character vectors a bit as we can detect sharing amongst the elements. Prof. Ripley, Thanks for the clarifications. I'll need to spend some time reading through R-exts.pdf and Rinternals.h. Regards, Marc __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Stacking Vectors/Dataframes
Archived versions of gregmisc (and other packages) are available from: http://cran.r-project.org/src/contrib/Archive/ Download one of the older versions (ie. 0.8.5) and install it from a console using R CMD INSTALL. If you are restricted from installing packages to the main R tree (ie. you do not have the requisite permissions), see R FAQ 5.2 regarding installing packages to alternate locations. HTH, Marc Schwartz On Mon, 2004-08-16 at 08:33, Laura Quinn wrote: As our IT man is currently on holiday I am not able to upgrade to version 1.9.0(or 1.9.1) at the moment, and I see that the gregmisc library will not work on earlier versions (I am using 1.8.0). Does anyone have any other suggestions how I might be able to acheive this? Thank you Laura Quinn Institute of Atmospheric Science School of the Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: [EMAIL PROTECTED] On Sun, 15 Aug 2004, Liaw, Andy wrote: I believe interleave() in the `gregmisc' package can do what you want. Cheers, Andy From: Laura Quinn Hello, Is there a simple way of stacking/merging two dataframes in R? I want to stack them piece-wise, not simply add one whole dataframe to the bottom of the other. I want to create as follows: x.frame: aX1 bX1 cX1 ... zX1 aX2 bX2 cX2 ... zX2 ... ... ... ... ... aX99 bX99 cX99 ... zX99 y.frame: aY1 bY1 cY1 ... zY1 aY2 bY2 cY2 ... zY2 ... ... ... ... ... aY99 bY99 cY99 ... zY99 new.frame: aX1 bX1 cX1 ... zX1 aY1 bY1 cY1 ... zY1 aX2 bX2 cX2 ... zX2 aY2 bY2 cY2 ... tY2 ... ... ... ... ... aX99 bX99 cX99 ... tX99 aY99 bY99 cY99 ... tY99 I have tried to use a for loop (simply assigning and also with rbind) to do this but am having difficulty correctly assigning the destination in the new dataframe. Can anyone offer a quick and easy way of doing this (or even a long winded one if it works!!) Thank you in advance, __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
On Tue, 2004-08-17 at 09:01, Arne Henningsen wrote: Hi, I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? Arne Here is a hint: # This returns an integer vector str(myData[ , var1 ] + myData[ , var2 ]) int [1:4] 6 8 10 12 # This returns a data.frame str(myData[ var1 ] + myData[ var2 ]) `data.frame': 4 obs. of 1 variable: $ var1: int 6 8 10 12 str(myData) `data.frame': 4 obs. of 5 variables: $ var1: int 1 2 3 4 $ var2: int 5 6 7 8 $ var3: int 6 8 10 12 $ var4:`data.frame': 4 obs. of 1 variable: ..$ var1: int 6 8 10 12 Take a look at the details, value and coercion sections of ?.data.frame HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
On Tue, 2004-08-17 at 09:34, Marc Schwartz wrote: Take a look at the details, value and coercion sections of ?.data.frame This must be my week for typos. That should be: ?[.data.frame (in ESS) or ?[.data.frame (otherwise) Marc __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] levels of factor
On Tue, 2004-08-17 at 09:30, Luis Rideau Cruz wrote: R-help, I have a data frame wich I subset like : a - subset(df,df$column2 %in% c(factor1,factor2) df$column2==1) But when I type levels(a$column2) I still get the same levels as in df (my original data frame) Why is that? The default for [.factor is: x[i, drop = FALSE] Hence, unused factor levels are retained. Is it right? Yes. If you want to explicitly recode the factor based upon only those levels that are actually in use, you can do something like the following: a - factor(a) However, I am a bit unclear as to the logic of the subset statement that you are using, perhaps b/c I don't know what your data is. You seem to be subsetting 'column2' on both the factor levels and a presumed numeric code. Is that really what you want to do? You might want to review the Warning section in ?factor BTW, when using subset(), the evaluation takes place within the data frame, so you do not need to use df$column2 in the function call. You can just use column2, for example: subset(df, column2 %in% c(factor1, factor2)) See ?factor and ?[.factor for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] all.equal and names?
It is in the Description now (at least for 1.9.1 patched): all.equal(x,y) is a utility to compare R objects x and y testing near equality. If they are different, comparison is still made to some extent, and a report of the differences is returned. Don't use all.equal directly in if expressionseither use identical or combine the two, as shown in the documentation for identical. There is also a reference to: attr.all.equal(target, current, ...) on the same help page, which returns the following using the example: attr.all.equal(1, c(a=1)) [1] names for current but not for target Not quite the same message as S-PLUS however. HTH, Marc On Wed, 2004-08-18 at 11:02, Spencer Graves wrote: Hi, Duncan: Thanks much. I think I remember reading about both all.equal and identical in Venables and Ripley (2002) MASS. Unfortunately, I don't have MASS handy now, and I could not find it otherwise, so I asked. What needs to happen to upgrade the all.equal documentation to add identical to the see also? Best Wishes, Spencer Duncan Murdoch wrote: On Wed, 18 Aug 2004 10:27:49 -0400, Spencer Graves [EMAIL PROTECTED] wrote : How can I compare two objects for structure, names, values, etc.? With R 1.9.1 under Windows 2000, the obvious choice all.equal ignores names and compares only values: all.equal(1, c(a=1)) [1] TRUE Under S-Plus 6.2, I get the comparison I expected: all.equal(1, c(a = 1)) [1] target, current classes differ: integer : named [2] class of target is \integer\, class of current is \named\ (coercing current to class of target) If you want the explanation you're out of luck, but identical() does the test: identical(1, c(a = 1)) [1] FALSE Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] header line generated write.table
On Wed, 2004-08-18 at 16:42, Y C Tao wrote: I want to write following data frame into a CSV file: Col1 Col2 Col3 Row1 1 1 1 Row2 2 2 2 where Row1, Row2 are the row names and Col1, Col2, Col3 are the column names. The correct CSV file should be: ,Col1,Col2,Col3 Row1,1,1,1 Row2,2,2,2 However, the one generated by R using write.table(x, file=xyz.csv, sep=,) has a header line that reads: Col1,Col2,Col3 without the comma at the very beginning. As a result, if you open the file in Excel, the column names are not correct (shifted to the left by one column). Is there a way to get around this? Thanks! The solution is on the help page for ?write.table: Details Normally there is no column name for a column of row names. If col.names=NA a blank column name is added. This can be used to write CSV files for input to spreadsheets. Also, the first example on that page gives you: ## To write a CSV file for input to Excel one might use write.table(x, file = foo.csv, sep = ,, col.names = NA) Thus: write.table(x, col.names = NA, sep = ,) ,Col1,Col2,Col3 Row1,1,1,1 Row2,2,2,2 HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is R good for not-professional-statistician, un-mathematical clinical researchers?
On Thu, 2004-08-19 at 01:45, Jacob Wegelin wrote: Alternate title: How can I persuade my students that R is for them? Alternate title: Can R replace SAS, SPSS or Stata for clinicians? I am teaching introductory statistics to twelve physicians and two veterinarians who have enrolled in a Mentored Clinical Research Training Program. My course is the first in a sequence of three. We (the instructors of this sequence) chose to teach R rather than some other computing environment. My (highly motivated) students have never encountered anything like R. One frankly asked: Do you feel (honestly) that a group of physicians (with two vets) clinicians will be able to effectively use and actually understand R? If so, I will happily call this bookstore and order this book [Venables and Ripley] tomorrow. I am heavily biased toward R/S because I have used it since the first applied statistics course I took. But I would love to give these students some kind of objective information about the usability of R by non-statisticians--not just my own bias. Could anyone suggest any such information? Or does anyone on this list use R who is a clinician and not really mathematically savvy? For instance, someone who doesn't remember any math beyond algebra and doesn't think in terms of P(A|B)? Or have we done a disservice to our students by choosing to make them learn R, rather than making ourselves learn SAS, Stata or SPSS? Thank you for any ideas Jake Wegelin A couple of questions: 1. What is the intended goal of the series of classes? 2. What are the expectations of the clinicians for themselves and what is their likely career path? Possible answers to the questions: 1. Provide the clinicians a reasonable (and perhaps broad) foundation of statistical knowledge. 2. To be able to have a reasonable comprehension of statistical concepts and methods so that in the future, as they are busy with patients (animals for the vets) in a clinical practice, they can intelligently interact with formally trained statisticians when engaged in clinical research in a multi-disciplinary team environment. If the above is close to reality, then let me suggest that you consider Peter's book Introductory Statistics with R rather than MASS, at least for the first class in the series. I cannot think of a more gentle, broad and competent way to introduce clinicians to both statistics and R at the same time. If these clinicians are likely to move on to busy clinical practices, in my experience having come out of the clinical environment, they will not have the time to sit at a computer and grind out analyses, much less maintain their proficiency with a programming language (R, Stata or SAS) or the broad range of statistical methodologies that they would likely encounter over their careers. They will however, need to be able to sit and interact with statisticians, bringing the significant value of their clinical training and knowledge, to the process of designing clinical research projects and effectively comprehend the multitude of issues in that endeavor. They will need to have an understanding of the complex processes by which data are collected, managed, manipulated and analyzed in the course of obtaining the resultant analyses. In other words, it is important that they realize that it is more than just a point and click process where voila, you have logistic regression model. They need to appreciate both the subtleties and complexities of dealing with real world research, incomplete data, etc. Many clinicians do not and this results in mis-matched expectations in the future as they deal with real world situations. There are certainly physicians who have made the decision to focus their careers on the statistical part of the research process, forsaking any significant clinical patient care role. They are far and few between, to my experience, though two or three immediately come to mind. They have also generally made the commitment to formal graduate level education in math/statistics securing advanced degrees. Short of that, there is typically a future dependence upon trained statisticians, either within an academic medical environment or via contracted services. The above is based upon my own experience, which is largely in sub-specialty clinical areas. Others may and perhaps will differ, based upon their own bias. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] paired t-test vs pairwise t-test
On Thu, 2004-08-19 at 14:42, Liaw, Andy wrote: From: Duncan Murdoch On Thu, 19 Aug 2004 13:42:21 -0300 (ADT), Rolf Turner [EMAIL PROTECTED] wrote : You wrote: What's the difference between t.test(x, y) and pairwise.t.test()? Is it just that the former takes two vectors, whereas the latter takes a vector and a factor? No. The pairwise.t.test() function (according to the help file) does a multiplicity of t-tests, on more than two samples, adjusting the p-value to compensate for the multiplicity by various methods. IMHO the name of this function is bad, because to me it suggests doing ***paired*** t-tests, which would trip up the naive user, who probably wouldn't notice or would ignore the t tests with pooled SD message in the output. As one of the Ripley fortunes says ``It really is hard to anticipate just how silly users can be.'' But why go out of the way to give them a chance to be silly? And Jack wrote: But the documentation, which I valiantly tried to make sense of BEFORE asking my stupid question, is not clear enough for this particular idiot. Might I suggest that the documentation be altered? It could use an example (as in, real-life applied statistical problem) of when pairwise.t.test() ought to be used, and why t.test(paired=TRUE) would be inappropriate in that context; it could also use a reference to some published paper, website or some such that explains the rationale and correct procedure for using this test. I think it's unlikely that we would rename the function; it's been around a while with its current name so that's a bad idea. On the other hand, clearer documentation is always a plus: why not submit some? I guess this is sort of related to the thread on whether R is good for non-statisticians... The help pages in R are sort of like *nix man pages. They give the technical information about the topic, but not necessarily the background. E.g., the man page for `chmod' does not explain file permissions in detail: the user is expected to learn that elsewhere. Perhaps other stat packages do it differently? Does SPSS manuals detail what its t-test procedure does, including which t-test(s) it does and when it's appropriate? That might make it easier on users, but I still think the users should learn the appropriate use of statistical procedures elsewhere... Best, Andy Andy, I don't know about SPSS, but SAS' documentation is available online at: http://support.sas.com/91doc/docMainpage.jsp The documentation specifically for PROC TTEST is at: http://support.sas.com/91doc/getDoc/statug.hlp/ttest_index.htm and the documentation for PROC MULTTEST is at: http://support.sas.com/91doc/getDoc/statug.hlp/multtest_index.htm Of course, to go along with the standard SAS documentation, there is the line of Books by Users, which parallels in a fashion, the increasing number of books on R, authored by members of this community. Best regards, Marc __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How generate A01, A02, ..., A99?
On Fri, 2004-08-20 at 15:15, Peter Dalgaard wrote: Sundar Dorai-Raj [EMAIL PROTECTED] writes: Yao, Minghua wrote: Hi, Anyone can tell me how to generate A01, A02, ..., A99? paste(A, 1:99, sep=) generates A1, A2,..., A99. This is not what I want. Thanks for the help. -MY [[alternative HTML version deleted]] How about? sapply(1:99, function(i) sprintf(A%02d, i)) or just sapply(1:99,sprintf,fmt=A%02d) or yet another variation: paste(A, formatC(1:99, width = 2, format = d, flag = 0), sep = ) HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] where is internal function of sample()?
On Mon, 2005-04-11 at 23:04 -0400, Weijie Cai wrote: Hi there, I am trying to write a c++ shared library for R. I need a function which has the same functionality as sample() in R, i.e., does permutation, sample with/without replacement. Does R have internal sample routine so that I can call it directly? I did not find it in R.h, Rinternal.h. Thanks A quick grep of the source code tree tells you that the function is in .../src/main/random.c A general pattern for C .Internal functions is to use a prefix of do_ in conjunction with the R function name. So in this case, the C function is called do_sample and begins at line 391 (for 2.0.1 patched) in the aforementioned C source file. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] removing characters from a string
On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote: Is there a simple way in R to remove all characters from a string other than those in a specified set? For example, I want to keep only the digits 0-9 in a string. In general, I have found the string handling abilities of R a bit limited. (Of course it's great for stats in general). Is there a good reference on this? Or should R programmers dump their output to a text file and use something like Perl or Python for sophisticated text processing? I am familiar with the basic functions such as nchar, substring, as.integer, print, cat, sprintf etc. Something like the following should work: x - paste(sample(c(letters, LETTERS, 0:9), 50, replace = TRUE), collapse = ) x [1] QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV gsub([^0-9], , x) [1] 8677 The use of gsub() here replaces any characters NOT in 0:9 with a , therefore leaving only the digits. See ?gsub for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cumulative Points and Confidence Interval Manipulation in barplot2
On Tue, 2005-04-12 at 10:14 -0500, Bret Collier wrote: R-Users, I am working with gplots (in gregmisc bundle) plotting some posterior probabilities (using barplot2) of harvest bag limits for discrete data (x-axis from 0 to 12, data is counts) and I ran into a couple of questions whose solutions have evaded me. 1) When I create and include the confidence intervals, the lower bound of the confidence intervals for several of the posterior probabilities is below zero, and in those specific cases I only want to show the upper limit for those CI's so they do not extend below the x-axis (as harvest can not be 0). Also, comments on a better technique for CI construction when the data is bounded to be =0 would be appreciated. 2) I would also like to show the cumulative probability (as say a point or line) across the range of the x-axis on the same figure at the top, but I have been unable to figure out how to overlay a set of cumulative points over the barplot across the same range as the x-axis. Below is some example code showing the test data I am working on (xzero): xzero - table(factor(WWNEW[HUNTTYPE==DOVEONLY], levels=0:12)) xzero 0 1 2 3 4 5 6 7 8 9 10 11 12 179 20 9 2 2 0 1 0 0 0 0 0 0 n - sum(xzero) k - sum(table(xzero)) meantheta1 -((2*xzero + 1)/(2*n + k)) vartheta1 -((2*(((2*n)+k)-((2*xzero)+1)))*((2*xzero)+1))/2*n)+k)^2)*(((2*n)+k)+2)) stderr - sqrt(vartheta1) cl.l - meantheta1-(stderr*2)#Fake CI: Test cl.u - meantheta1+(stderr*2)#Fake CI: Test barplot2(meantheta1, xlab=WWD HARVEST DOVE ONLY 2001, ylab=Probability, ylim=c(0, 1),xpd=F, col=blue, border=black, axis.lty=1,plot.ci=TRUE, ci.u = cl.u, ci.l = cl.l) title(main=WHITE WING DOVE HARVEST PROBABILITIES: DOVE HUNT ONLY) I would greatly appreciate any direction or assistance, Thanks, Bret Bret, If you replace the lower bound of your confidence intervals as follows, you can get just the upper bound plotted: cl.l.new - ifelse(cl.l = 0, cl.l, meantheta1) This will set the lower bound to meantheta1 in those cases, thus plotting the upper portion and you can remove the 'xpd=F' argument. Use 'ci.l = cl.l.new' here: barplot2(meantheta1, xlab=WWD HARVEST DOVE ONLY 2001, ylab=Probability, ylim=c(0, 1), col=blue, border=black, axis.lty=1,plot.ci=TRUE, ci.u = cl.u, ci.l = cl.l.new) I would defer to others with more Bayesian experience on alternatives for calculating bounded CI's for the PP's. With respect to the cumulative probabilities, if I am picturing the same thing you are, you can use the cumsum() function and then add points and/or a line as follows: points(cumsum(meantheta1), pch = 19) lines(cumsum(meantheta1), lty = solid) See ?cumsum, ?points and ?lines for more information. BTW, some strategically placed spaces would help make your code a bit more readable for folks. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R in Windows
On Wed, 2005-04-13 at 10:51 -0400, George Kelley wrote: Has anyone tried to create dialog boxes for Windows in R so that one doesn't have to type in so much information but rather enter it in a menu-based format. If not, does anyone plan on doing this in the future if it's possible? Thanks. George (Kelley) There are a variety of GUI's being actively developed for R. More information is here: http://www.sciviews.org/_rgui/ I don't use it actively, but I might specifically suggest that you review John Fox' R Commander: http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ It is written in tcl/tk, which makes it cross-platform compatible if that is an issue for you. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] terminate R program when trying to access out-of-bounds arrayelement?
On Wed, 2005-04-13 at 15:03 -0700, Berton Gunter wrote: WHOA! Do not redefine R functions (especially [ !) in this way! That's what R classes and methods (either S3 or S4) are for. Same applies to print methods. See the appropriate sections of the R language definition and the book S PROGRAMMING by VR. Please do not offer advice of this sort if you are not knowledgeable about R/S Programming, as it might be taken seriously. I think that we have another entry for the fortunes package... :-) Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html