Re: [R] Adding a legend to a (multi-facet) plot produced by ggplot().

2019-12-01 Thread Antony Unwin
How about defining your dataset differently, making the colouring property a 
variable?

xxx <- data.frame(x=rep(x, 4), y=c(y2, y3), grp=factor(rep(c("a","b"),each=20, 
times=2)), type=factor(rep(c("clyde", "irving"), each=40)))
ggplot(xxx, aes(x,y, colour=type, shape=type)) + geom_point() + 
geom_abline(intercept=3, slope=2) + facet_wrap(vars(grp)) + 
scale_colour_manual(values=c("blue", "red"))  + 
scale_shape_manual(values=c(20,3))

Then you could also plot the four groups separately if you wanted to:

ggplot(xxx, aes(x,y, colour=type, shape=type)) + geom_point() + 
geom_abline(intercept=3, slope=2) + facet_grid(rows=vars(type), cols=vars(grp)) 
+ scale_colour_manual(values=c("blue", "red"))  + 
scale_shape_manual(values=c(20,3))

Antony Unwin
University of Augsburg, 
Germany




> From: Rolf Turner 
> Subject: [R] Adding a legend to a (multi-facet) plot produced by ggplot().
> Date: 1 December 2019 at 01:04:46 CET
> To: R help 
> 
> 
> 
> I have been struggling to add a legend as indicated in the subject line,
> with no success at all.  I find the help to be completely bewildering.
> 
> I have attached the code of what I have tried in the context of a simple
> reproducible example.
> 
> I have also attached a pdf file of a plot produced with base graphics to 
> illustrate roughly what I am after.
> 
> I would be grateful if someone could point me in the right direction.
> 
> cheers,
> 
> Rolf Turner
> 
> -- 
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] OutliersO3 version 0.5.3 released

2018-02-07 Thread Antony Unwin
Dear all,

A revised version of OutliersO3 is available on CRAN:
<https://cran.r-project.org/web/packages/OutliersO3/index.html 
<https://cran.r-project.org/web/packages/OutliersO3/index.html>>.

The package has been restructured.  The default is now that the tolerance level 
is set individually for each of the (six) outlier methods included.  Plots have 
been added, as have outlier tables and scores for further analysis.  It is also 
possible to draw an O3 plot using your own outlier identification method, see 
the vignette for more details.

There are four vignettes to illustrate the use of the package.

Queries, comments, suggestions are welcome.  Thanks to Michael Friendly, 
Tae-Rae Kim, Nina Wu, and, in particular, Bill Venables for their comments on 
the old version.

Regards

Antony

Professor Antony Unwin
Mathematics Institute,
University of Augsburg, 
86135 Augsburg, Germany
[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] The OutliersO3 package is now on CRAN

2017-09-28 Thread Antony Unwin
Dear all,

The new package OutliersO3 is now available on CRAN:
<https://cran.r-project.org/web/packages/OutliersO3/index.html>.

The aim is to graphically compare results of outlier analyses for all possible 
combinations of variables in a dataset.

Various kinds of O3 (Overview of Outliers) plots can be drawn to show which 
cases are classified as outliers for which combinations of variables.
Up to five different methods can be used to identify the potential outliers in 
a dataset.

There is a vignette:
https://cran.r-project.org/web/packages/OutliersO3/vignettes/O3-vignette.html

and a video of a talk on O3 plots from useR!:

https://channel9.msdn.com/events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/When-is-an-Outlier-an-Outlier-The-O3-plot?term=unwin

Queries, comments, suggestions are welcome.

Regards

Antony

Professor Antony Unwin
Mathematics Institute,
University of Augsburg, 
86135 Augsburg, Germany




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Course in Dublin (May 24th-May 26th, 2017) Introductory -> Modern

2017-04-19 Thread Antony Unwin
An R course from introductory to modern will be given by

Louis Aslett (Durham University, author of the packages PhaseType and 
ReliabilityTheory)
and
Antony Unwin (author of the book “Graphical Data Analysis with R” CRC Press 
2015  http://www.gradaanwr.net <http://www.gradaanwr.net/>).

The course will be held in Dublin at the IPA on Lansdowne Road (next to the 
Rugby ground) from May 24th to May 26th, 2017.

Details at  

http://insightsc.ie/training/r-statistical-software/ 
<http://insightsc.ie/training/r-statistical-software/>

or send an email to train...@insightsc.ie <mailto:train...@insightsc.ie> for 
further information

Antony Unwin
Insight Statistical Consulting, Dublin, Ireland
University of Augsburg, Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Course in Dublin (January 30th-February 1st, 2017) Intoductory -> Modern

2016-11-27 Thread Antony Unwin
An R course from introductory to modern will be given by

Louis Aslett (Oxford University, author of the packages PhaseType and 
ReliabilityTheory)
and
Antony Unwin (author of the book “Graphical Data Analysis with R” CRC Press 
2015  http://www.gradaanwr.net <http://www.gradaanwr.net/>).

The course will be held in Dublin from January30th to February 1st, 2017.

Details at  

http://insightsc.ie/training/r-statistical-software/ 
<http://insightsc.ie/training/r-statistical-software/>


Antony Unwin
Insight Statistical Consulting, Dublin, Ireland
University of Augsburg, Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Course in Dublin (July 20th-22nd, 2016) Intoductory -> Modern

2016-05-24 Thread Antony Unwin
An R course from introductory to modern will be given by

Louis Aslett (Oxford University, author of the packages PhaseType and 
ReliabilityTheory)
and
Antony Unwin (author of the book “Graphical Data Analysis with R” CRC Press 
2015  http://www.gradaanwr.net).

The course will be offered again on September 7th-9th, 2016 in Dublin.

Details at  

http://insightsc.ie/training/r-statistical-software/ 
<http://insightsc.ie/training/r-statistical-software/>


Antony Unwin
Insight Statistical Consulting, Dublin, Ireland
University of Augsburg, Germany




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Course in Dublin (February 3rd-5th)

2016-01-05 Thread Antony Unwin
The course will be given by Louis Aslett (Oxford University, author of the 
packages PhaseType and ReliabilityTheory) and Antony Unwin (author of the book 
“Graphical Data Analysis with R” CRC Press 2015).

Details at  

http://insightsc.ie/training/r-statistical-software/ 
<http://insightsc.ie/training/r-statistical-software/>

Antony Unwin
University of Augsburg, Germany and Insight Statistical Consulting, Dublin, 
Ireland
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Course in Dublin (September 14-16)

2015-07-21 Thread Antony Unwin
Details at  

http://insightsc.ie/training/r-statistical-software/ 
http://insightsc.ie/training/r-statistical-software/

Antony Unwin
University of Augsburg, Germany and Insight Statistical Consulting, Dublin, 
Ireland
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Course in Dublin (April 15-17)

2015-03-19 Thread Antony Unwin
Details at  

http://insightsc.ie/training/r-statistical-software/

Antony Unwin
University of Augsburg, Germany and Insight Statistical Consulting, Dublin, 
Ireland
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bill Veanables Workshop

2012-06-07 Thread Antony Unwin
Bill Venables talks R :: Augsburg University, Germany :: 2-3 July 2012

Bill Venables will give a two-day R Workshop in Augsburg on the 2nd and 3rd 
July 2012, an expanded version of the course, which he has been invited to give 
at this year's useR! meeting in Nashville.

Details: www.math.uni-augsburg.de/termin/R-workshop.html

Organised by the
Department of Computer-Oriented Statistics and Data Analysis,
University of Augsburg

Antony Unwin
un...@math.uni-augsburg.de



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re. When is *interactive* data visualization useful to use?

2011-02-11 Thread Antony Unwin
Hello Tal,

You asked *When is it helpful to use interactive plots? Either for data 
exploration (for ourselves) and data presentation (for a client)?*

My answer: It's helpful for checking data quality, for exploration with and 
without clients, for checking results, and for data presenting.

Notes:
(1) It's difficult to explain interactive data visualization in print, 
demonstrations are so much more effective.
(2) Interactive data visualization is fun, both for the analyst, and more 
important, for the dataset owners.  You not only get better interaction with 
the data, you get better interaction with the scientists you cooperate with.  
They are prepared to contribute, because they can understand what is going on.  
That is not always the case with statistical models.
(3) The key is not animation but direct manipulation.  The aim is to be 
able to directly interact with all statistical objects in a graphic: querying, 
linking, reordering, reformatting, zooming, whatever.
(4) You write of point-based graphics, what about area-based graphics like 
histograms, barcharts and mosaicplots?  For categorical data the ability to 
select groups and look at spineplots of other variables to compare proportions 
is very effective. (And don't forget linking to maps for spatial data.)
(5) You mention outliers.  How do you decide what is an outlier?  Interactive 
parallel coordinate plots are extremely useful, either for identifying outliers 
or for checking ones found with an analytic approach.
(6) Interactive data visualization is not in competition with other approaches, 
it complements them.  Results found with models should be checked graphically 
and results found graphically should be checked analytically.  Your comment 
about data dredging is important, though why people think this only happens 
with graphics and not with modelling approaches always puzzles me!
(7) There are often interesting features of a dataset (not just errors and 
outlier groups) that can be found graphically that would be difficult or 
impossible to find analytically.

Have a look at Interactive Graphics for Data Analysis: Principles and Examples 
by Martin Theus and Simon Urbanek (Chapman  Hall).  There are some excellent 
explanations and case studies there.

I could go on (and on), but what you really need is a good demo.

Best regards

Antony

PS Have you reported the bugs in GGobi and Mondrian you have found to the 
software authors?

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg, 
86135 Augsburg, Germany

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Where has the stats-rosuda-devel mailing list gone?

2010-05-18 Thread Antony Unwin
Oliver,

Apologies for the confusion, there was a server upgrade in the computer centre 
here which gave us some grief.  The list should be fine now.

Best regards

Antony

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg, 
86135 Augsburg, Germany

 From: o.mann...@auckland.ac.nz o.mann...@auckland.ac.nz
 Date: 14 May 2010 12:51:03 AM CEST
 To: 'r-help@r-project.org' r-help@r-project.org
 Subject: [R] Where has the stats-rosuda-devel mailing list gone?
 
 
 I require some assistance with JGR, but following the mailing list link from 
 http://jgr.markushelbig.org/FAQ.html leads me 
 tohttp://mailman.rz.uni-augsburg.de/mailman/listinfo/stats-rosuda-devel which 
 responds with
 
 No such list stats-rosuda-devel
 
 I was previously subscribed to this mailing list and want to resubscribe, but 
 where has it gone?
 
 Many thanks,
 
 
 Oliver Mannion
 Programmer
 COMPASS - Centre of Methods and Policy Application in the Social Sciences
 www.compass.auckland.ac.nz
 The University of Auckland, New Zealand
 
 Phone +(649) 373 7999 ext 89760





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Visualizing binary response data?

2010-05-05 Thread Antony Unwin
You could also try using interactive graphics in iplots.  Linking from a 
barchart of your binary response variable to your eight continuous predictors 
in a parallel coordinate plot and to your four categorical predictors in some 
form of mosaicplot could be very informative.

Graphics are not necessarily the method of choice to select your predictor 
variables, as Frank Harrell has pointed out.  It is also sensible not to rely 
on modelling alone.  Graphic displays can help you better understand your data 
and models.  The two approaches are complementary.

Antony Unwin
University of Augsburg
Germany


On Tue, May 4, 2010 at 9:04 PM, Kim Jung Hwa kimhwamaill...@gmail.comwrote:

 Hi All,
 
 I'm dealing with binary response data for the first time, and I'm confused
 about what kind of graphics I could explore in order to pick relevant
 predictors and their relation with response variable.
 
 I have 8-10 continuous predictors and 4-5 categorical predictors. Can
 anyone
 suggest what kind of graphics I can explore to see how predictors behave
 w.r.t. response variable...
 
 Any help would be greatly appreciated, thanks,
 Kim
 __


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pairs plots in R

2008-10-20 Thread Antony Unwin
If you want to do efficient exploratory data analysis on this kind of  
dataset, then interactive graphics with parallel coordinate plots  
(ipcp in iplots) should help.  Of course, it depends what you mean by  
large.  It might be worth looking at the book Graphics of Large  
Datasets for some ideas.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
86135 Augsburg, Germany
Tel: + 49 821 5982218



 From: Sharma, Dhruv [EMAIL PROTECTED]
 Date: 19 October 2008 10:58:53 pm GMT+02:00
 To: r-help@r-project.org
 Subject: [R] pairs plots in R


 Hi,
  is there a way to take a data frame with 100+ columns and large  
 data set to do efficient exploratory analysis in R with pairs?

 I find using pairs on the whole matrix is slow and the resulting  
 matrix is tiny.

 Also the variable of interest for me is a binary var Y or N .

 Is there an efficient way to graphically view many variable  
 relationships that does not look teeny ?

 I could do pairs 10 at a time but this seems too brute force.

 thanks
 Dhruv

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using interactive plots to get information about data points

2008-08-27 Thread Antony Unwin
 I have been experimenting with interactive packages such iplots and  
 playwith. Consider the following sample dataset:
 A B C D
 1 5 5 9
 3 2 8 4
 1 7 3 0
 7 2 2 6
 Let's say I make a plot of variable A. I would like to be able to  
 click on a data point (e.g. 3) and have a pop-up window tell me the  
 corresponding value for variable D (e.g. 4).


You're right that iplots can't do that (it's on the wishlist), but it  
offers alternatives.  As a multiwindowing package, it is natural to  
have graphics displays open for all variables of current interest.   
This means that selecting a point highlights it in all displays and  
you can see or query the corresponding values.


Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
86135 Augsburg, Germany
Tel: + 49 821 5982218

[EMAIL PROTECTED]

http://stats.math.uni-augsburg.de/




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] History pruning

2008-08-01 Thread Antony Unwin
JGR's Copy Commands command works well for me (even if it is both  
fascinating and embarrassing how little is sometimes left over).  It  
retains only commands that worked, so it is still not the minimum  
possible.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
86135 Augsburg, Germany
Tel: + 49 821 5982218

[EMAIL PROTECTED]

http://stats.math.uni-augsburg.de/




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Datasets in R

2008-05-30 Thread Antony Unwin
Carlos,

There are many sources of real datasets (in R itself, on the web), you  
just need to look a little.  For teaching purposes, I think it is  
always better to use real datasets than to use simulated ones.

One thing bothers me, though.  You imply that in all the examples you  
have the data are well fit with linear models, the residuals are  
normal and there is no sign of heteroscedacity.  That sounds a very  
unusual set of examples!

Best

Antony


 From: Roland Rau [EMAIL PROTECTED]
 Date: 30 May 2008 12:23:17 AM GMT+02:00
 To: Carlos López [EMAIL PROTECTED]
 Cc: r-help@r-project.org
 Subject: Re: [R] Datasets in R


 Carlos López wrote:
 I´m trying to find datasets that will give me residuals, after  
 applying the lm function, with no normality, non linearity, and  
 heteroscedacity so I can try to exemplify
 those cases in the linear regression model. Can you give any advice  
 on what datasets would be appropiate? I can´t use the ones in the  
 alr3 package because those have
 already been seen in class.
 Thank you very much :-)
 natorro
 if you know what you are looking for (or not looking for), wouldn't  
 it be the easiest and fastest thing to do to simulate such a dataset  
 yourself?

 Best,
 Roland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Response to R across the university

2008-04-18 Thread Antony Unwin

On 18 Apr 2008, at 6:42 pm, Peter Dalgaard wrote:

 Antony Unwin wrote:
 ...

 The course itself went very well.  We encouraged people to bring  
 their  laptops and work in groups.  Using JGR as the interface to R  
 helped a  lot, as it was easier for people to load their own data  
 and use the  help.  Of course, JGR is compulsory in Augsburg.
 Speaking of JGR... What are the appropriate channels to complain and/ 
 or contribute?

This will do fine, though [EMAIL PROTECTED]  
would be the official route and Markus Helbig ([EMAIL PROTECTED])  
is the key person.

 I had looked into it at an earlier point (on Fedora Linux) and got  
 stuck on some fairly simple usability issues, like font choice and  
 color scheme. Things like

 - if you select a bigger font, the window size remains the same.  
 Changes to window size do not survive to subsequent invokations.

 - output is quite unreadable in proportional fonts, so why make them  
 available?

 - some fonts have poor contrast, but there seems to be no way to  
 select boldface versions.

 - the latest version has turned to a blue-on-gray scheme, which  
 doesn't help with the contrast either
 This is all pretty trivial stuff, but the bottom line is that all  
 the really exciting stuff isn't really of much use if students  
 cannot read it in the back rows.

Your points should certainly be looked into.  Having the font big  
enough for students to read in the back row has not been a problem for  
me.

 A couple other maybe not all that trivial things to do is to improve  
 the data import (it is losing out on most of the things that I tried)

Now what would Brian say to a comment like that?  Please insert your  
favourite put-down here:

 

And then perhaps you would be kind enough to let us know in a little  
more detail what hasn't worked for you.

 and to get the wires connected between the DataTable and the edit()  
 command.

Thanks for your comments.

Antony




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Response to R across the university

2008-04-12 Thread Antony Unwin
This email isn't asking for assistance, but I thought R-help readers  
would find it interesting.  This week we offered a half-day  
introduction to R for researchers at Augsburg University.  The  
response was astonishing.  Although Augsburg has no medical faculty  
and no engineers, there was far too much demand, with interest from  
every faculty (barring theology, for one small village of indomitable  
Gauls still holds out against the R invaders --- perhaps that should  
be obdurate rather than indomitable) and we had participants from  
computer science, geography, physics, law, linguistics, education,  
sociology, marketing, psychology, finance, ...

The course itself went very well.  We encouraged people to bring their  
laptops and work in groups.  Using JGR as the interface to R helped a  
lot, as it was easier for people to load their own data and use the  
help.  Of course, JGR is compulsory in Augsburg.  Giving everyone a  
Butterbreze (a local delicacy) halfway through may have contributed to  
the good humour of the course as well!

Statistics doesn't always have a positive image.  I can recommend  
running an R course as one way of making a good impression.


Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
86135 Augsburg, Germany
Tel: + 49 821 5982218
http://stats.math.uni-augsburg.de/




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spidergram

2008-03-05 Thread Antony Unwin
A parallel coordinate plot would do fine.  Load the package iplots  
and then use the command ipcp(x1, x2,...)

Antony Unwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Raw histogram plots

2008-03-01 Thread Antony Unwin
Why not use the interactive histogram in iplots?  ihist(x)  Then you  
can vary the binwidth interactively and get a very quick idea of the  
structure of your data by looking at a range of plots with different  
binwidths.  Relying on a single plot to reveal everything about a  
variable's distribution is not a good idea.

A couple of people suggested estimating the density.  That may miss  
roundings, discretisation or other odd structures.  We should never  
underestimate what Peter Huber called the rawness of raw data.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
Germany




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatterplot Showing All Points

2007-12-18 Thread Antony Unwin

On 18 Dec 2007, at 2:42 pm, Duncan Murdoch wrote:

 (I must admit to being very surprised that jittering and  
 sunflower  plots have been suggested for a dataset of 5000  
 points.  Do those who  mentioned these methods have examples on  
 that scale where they are  effective?)

 Sure.  The original post said there were about 50-60 unique  
 locations. This plot:

 x - rbinom(5000, 20, 0.15)
 y - rbinom(5000, 20, 0.15)
 plot(x,y)

 has a few more unique locations; tune those probabilities if you  
 want it closer.  Due to the overlap, the distribution is very  
 unclear.  But this plot

 plot(jitter(x), jitter(y))

 makes the distribution quite clear.

No it doesn't!  It makes it moderately clearer than the plot without  
jittering.  One good alternative here is the fluctuation diagram  
variant of a mosaic plot:

xx-as.factor(x)
yy-as.factor(y)
imosaic(xx,yy, type=f)

Using jittering for categorical data is really not to be recommended  
and will certainly degrade in performance as the dataset gets bigger.


Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
University of Augsburg,
Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scatterplot Showing All Points

2007-12-18 Thread Antony Unwin

On 18 Dec 2007, at 4:49 pm, Duncan Murdoch wrote:

 One good alternative here is the fluctuation diagram  variant of a  
 mosaic plot:
 xx-as.factor(x)
 yy-as.factor(y)
 imosaic(xx,yy, type=f)

 That plot is better than jittering, but there's the problem in the  
 mosaic plot of understanding the scale of the rectangles:  is it  
 area or diameter that encodes the count?

Area is used.

 With a jittered plot, you lose resolution when the number of points  
 gets too high because you just see a mess of ink, but at least you  
 only require the viewer to count in order to get a close numerical  
 reading from the plot.

If someone needs a count, they should be given a table.   Graphics  
are for qualitative conclusions not details.  Anyway, counting will  
only work for really small datasets.

 I could also claim that while imperfect, at least jittering is  
 widely applicable.  For example, if the data were not on a regular  
 grid, perhaps because they had been generated like this:

 xloc - rnorm(50)
 yloc - rnorm(50)
 index - sample(1:50, 5000, rep=TRUE, prob = abs(xloc))
 x - xloc[index]
 y - yloc[index]

 then jittering still works as well (or as poorly), but the imosaic  
 would not work at all.

That's right and that's (almost) the sort of example I was thinking  
of.  For a limited number of locations like this a bubble plot would  
be best (which has already been suggested in this thread, I think).   
For many locations and few replications I would still go for varying  
pointsize and transparency.

Incidentally, to check your suggestion I ran your code and discovered  
that the transparency in iplot does not seem to like replications.   
Very strange, we'll have to check why.  I then looked closely at the  
numbers of replications generated and discovered that case 25 was  
picked 325 times and case 40 only once.  Rather too extreme for my  
liking!  Running it again gave very similar results, though not  
exactly the same: this time it was 325 times for case 25 and case 40  
was not picked at all.  Other numbers varied slightly.  This is not  
what I expected, any ideas?

 P.S. iplots 1.1-1 may have an init problem in Windows: in my first  
 attempt, the plot made the boxes too large to fit in their cells,  
 but it fixed itself when I resized the window, and the bug doesn't  
 seem to be repeatable.

Thanks.  This happens occasionally on the Mac too.  Refreshing solves  
it in practice, but we need to find out why it can happen (and stop  
it happening!).

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
University of Augsburg,
Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages - a great resource, but hard to find the right one

2007-11-23 Thread Antony Unwin
Johannes Hüsing wrote

  Above all there are lots of packages.  As the software editor of the
  Journal of Statistical Software I suggested we should review R
  packages.

 You mean: prior to submission?

No.

  No one has shown any enthusiasm for this suggestion, but I
  think it would help.  Any volunteers?

 Thing is, I may like to volunteer, but not in the here's a
 package for you to review by week 32 way. Rather in the way that
 I search a package which fits my problem.

That's what I was hoping for.

 One package lets me down
 and I'd like to know other users and the maintainer about it.
 The other one works black magic and I'd like to drop a raving
 review about it. This needs an infrastructure with a low barrier
 to entry. A wiki is not the worst idea if the initial infrastructure
 is geared at addressing problems rather than packages.

We should differentiate between rave reviews of features that just  
happened to be very useful to someone and reviews of a package as a  
whole.  Both have their place and at the moment we don't have either.

If you are willing to review an R package or aspects of R for JSS  
please let me know.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages - a great resource, but hard to find the right one

2007-11-23 Thread Antony Unwin

On 23 Nov 2007, at 4:51 pm, hadley wickham wrote:

 There are two common types of review.  When reviewing a paper, you are
 helping the author to make a better paper (and it's initiated by the
 author). When reviewing a book, you are providing advise on whether
 someone should make an expensive purchase (and it's initiated by an
 third party).  Reviewing an R package seems somewhat in between.  How
 would you deal with new version of an R package?  It seems like there
 is the potential for reviews to become stale very quickly.

This is a strange argument.  A good package will get a good review,  
which may help it to become better.  A review of a weak package can  
point out how it can be fixed.  Reviews will not become stale, just  
because packages are frequently updated by their authors (like some  
that could be mentioned).  These are generally smaller changes.  A  
constructive review will not just be concerned with details, but more  
with the overall aims of the package and how they are achieved (or  
not achieved).

 Another model to look at would be that of an encyclopedia, something
 like the existing task views.  To me, it would be of more benefit if
 JSS provided support, peer review, and regular review, for these.

Why should JSS, one of the few journals for statistical software,  
review texts?  Task views are a good idea, but are general.  They  
give only a brief and subjective overview (and can hardly be expected  
to do more).

 Entries would be more of a survey, and could provide links to the
 literature, much like a chapter of MASS.

If you were not an enthusiastic author of many R packages I would  
start to think that you are afraid of being reviewed, Hadley!  What  
have you against someone studying a package, a group of packages or  
some other aspect of R in detail?  Maybe I had better start reviewers  
on your packages first...

Thanks to several people who have contacted me independently and  
offered to review packages, I'll keep the list informed about how  
that goes.  Apologies for JSS's webpage being down to-day,  Jan de  
Leuw tells me it's something to do with Thanksgiving weekend.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
Germany


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages - a great resource, but hard to find the right one

2007-11-22 Thread Antony Unwin
There have been several constructive responses to John Sorkin's  
comment, but none of them are fully satisfactory.  Of course, if you  
know the name of the function you are looking for, there are lots of  
ways to search — provided that everyone calls the function by a name  
that matches your search.  If you think there might be a function,  
but you don't know the name, then you have to be lucky in how you  
search.  R is a language and the suggestions so far seem to me like  
dictionary suggestions, whereas maybe what John is looking for is  
something more like a thesarus.

R packages are a strange collection, as befits a growing language.   
There are large packages, small packages, good packages (and not so  
good packages), personal mixtures of tools in packages, packages to  
accompany books, superceded packages, unusual packages, everything.   
Above all there are lots of packages.  As the software editor of the  
Journal of Statistical Software I suggested we should review R  
packages.  No one has shown any enthusiasm for this suggestion, but I  
think it would help.  Any volunteers?

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,
Germany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tart charts

2007-10-08 Thread Antony Unwin
Michael,

 Try this alternative:

 # from http://research.microsoft.com/users/lamport/pubs/hair.pdf
 hairsex - matrix(
c(46, 45, 13, 12,
   1, 101, 0, 20), 2, 4, byrow=TRUE)
 dimnames(hairsex) - list(Gender=c(Female, Male),
   Hair color=c(Blond, Brown, Red, Other) )

 library(vcd)
 mosaic(hairsex, shade=TRUE)

 There are uses for pie charts, but this isn't one of the better ones.

There are many kinds of mosaic plots, but this isn't one of the  
better ones.  A multiple barchart looks good here.  I did like your  
idea of using colours, it emphasised the number of women with dark  
blue hair.

Antony
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sprucing up the R homepage

2007-09-27 Thread Antony Unwin
It's a good idea to spruce up the graphics on R's webpage, but before  
we get too excited about improving how they are drawn, shouldn't we  
think about improving what has been drawn?

The original graphic showed off a wide variety of graphics which can  
be drawn with R, all applied to the swiss fertility dataset.  Are  
these the kinds of graphics we would want to draw in a real  
analysis?  I think a single parallel coordinate plot is more  
informative than this collection and would be easier to explain.  If  
you want to try it for yourself, use the package iplots with data 
(swiss) and then ipcp(swiss).

So maybe someone should suggest graphics from another dataset to  
adorn the webpage and demonstrate R's graphics capabilities.

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
Mathematics Institute,
University of Augsburg,


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.