Re: [R] log y 'axis' of histogram

2010-08-30 Thread David Scott

On 30/08/2010 1:58 p.m., Derek M Jones wrote:

All,

I have been trying to get calls to hist(...) to be plotted
with the y-axis having a log scale.

I have tried: par(ylog=TRUE)

I have also looked at the histogram package.

Suggestions welcome.



You appear to be looking for a log-histogram function.

There is one (logHist) in my package DistributionUtils on CRAN. You 
don't need the rest of the package to use it. You could just extract 
that particular function.


David Scott

--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
It's not just that counts might be zero, but also that the base of
each bar starts at zero. I really don't see how logging the y/axis of
a histogram makes sense.

Hadley

On Sunday, August 29, 2010, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi Derek,

 Here is an option using the package ggplot2:

 library(ggplot2)
 x - sample(x = 10:50, size = 50, replace = TRUE)
 qplot(x = x, geom = histogram) + scale_y_log()

 However, the log scale is often inappropriate for histograms, because
 the y-axis represents counts, which could potentially be 0, and
 therefore undefined (R outputs -Inf).  Another option using base
 graphics would be something along the lines (no pun intended) of:

 temp - hist(x, plot = FALSE) #get histogram data
 plot(x = temp$mids, y = log(temp$counts), type = h)

 HTH,

 Josh

 On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones de...@knosof.co.uk wrote:
 All,

 I have been trying to get calls to hist(...) to be plotted
 with the y-axis having a log scale.

 I have tried: par(ylog=TRUE)

 I have also looked at the histogram package.

 Suggestions welcome.

 --
 Derek M. Jones                         tel: +44 (0) 1252 520 667
 Knowledge Software Ltd                 mailto:de...@knosof.co.uk
 Source code analysis                   http://www.knosof.co.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/

 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones

Hadley,


It's not just that counts might be zero, but also that the base of
each bar starts at zero. I really don't see how logging the y/axis of
a histogram makes sense.


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen

The references given for logHist in David Scott's DistributionUtils
package are:

Barndorff-Nielsen, O. (1977) Exponentially decreasing distributions for 
the logarithm of particle size, Proc. Roy. Soc. Lond., A353, 401–419.


Barndorff-Nielsen, O. and Blæsild, P (1983). Hyperbolic distributions. 
In Encyclopedia of Statistical Sciences, eds., Johnson, N. L., Kotz, S. 
and Read, C. B., Vol. 3, pp. 700–707. New York: Wiley.


Fieller, N. J., Flenley, E. C. and Olbricht, W. (1992) Statistics of 
particle size data. Appl. Statist., 41, 127–146.




Hadley

On Sunday, August 29, 2010, Joshua Wileyjwiley.ps...@gmail.com  wrote:

Hi Derek,

Here is an option using the package ggplot2:

library(ggplot2)
x- sample(x = 10:50, size = 50, replace = TRUE)
qplot(x = x, geom = histogram) + scale_y_log()

However, the log scale is often inappropriate for histograms, because
the y-axis represents counts, which could potentially be 0, and
therefore undefined (R outputs -Inf).  Another option using base
graphics would be something along the lines (no pun intended) of:

temp- hist(x, plot = FALSE) #get histogram data
plot(x = temp$mids, y = log(temp$counts), type = h)

HTH,

Josh

On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jonesde...@knosof.co.uk  wrote:

All,

I have been trying to get calls to hist(...) to be plotted
with the y-axis having a log scale.

I have tried: par(ylog=TRUE)

I have also looked at the histogram package.

Suggestions welcome.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
 I have counts ranging over 4-6 orders of magnitude with peaks
 occurring at various 'magic' values.  Using a log scale for the
 y-axis enables the smaller peaks, which would otherwise
 be almost invisible bumps along the x-axis, to be seen

That doesn't justify the use of a _histogram_  - and regardless of
what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).

It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones

Hadley,


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen


That doesn't justify the use of a _histogram_  - and regardless of


The usage highlights meaningful characteristics of the data.
What better justification for any method of analysis and display is
there?


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).


Does there have to be a recognized statistical distribution to use R?
In my case I am using R for all of the analysis and graphics in a
new book.  This means that sometimes I have to deal with data sets
that are more or less a jumble of numbers with patterns in a few
places.  For instance, the numeric value of integer constants
appearing as one operand of the binary bitwise-AND operator (see
figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
at: www.knosof.co.uk/cbook/bandcons.hist.gz)

qplot(band, binwidth=8, geom=histogram) + scale_y_log()
does a good job of highlighting the peaks.


It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.


Doesn't being useful for my purpose make it meaningful, at least for me
and I hope my readers?

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham
 That doesn't justify the use of a _histogram_  - and regardless of

 The usage highlights meaningful characteristics of the data.
 What better justification for any method of analysis and display is
 there?

That you're displaying something that is mathematically well founded
and meaningful - but my emphasis there was on histogram.  I don't
think a histogram makes sense, but there are other ways of displaying
the same data that would (e.g. a frequency polygon, or maybe a density
plot)

 what distributional display you use, logging the counts imposes some
 pretty heavy restrictions on the shape of the distribution (e.g. that
 it must not drop to zero).

 Does there have to be a recognized statistical distribution to use R?

My point is about the display - if your binned counts look like 1,
100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
counts?

 In my case I am using R for all of the analysis and graphics in a
 new book.  This means that sometimes I have to deal with data sets
 that are more or less a jumble of numbers with patterns in a few
 places.  For instance, the numeric value of integer constants
 appearing as one operand of the binary bitwise-AND operator (see
 figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
 at: www.knosof.co.uk/cbook/bandcons.hist.gz)

 qplot(band, binwidth=8, geom=histogram) + scale_y_log()
 does a good job of highlighting the peaks.

I couldn't find that figure, but I'd think geom = freqpoly would be
more appropriate.  (I'd also suggest adding a bit more space between
the data and the margins in your figures - they overlap in many
plots).

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones

Hadley,


That you're displaying something that is mathematically well founded
and meaningful - but my emphasis there was on histogram.  I don't
think a histogram makes sense, but there are other ways of displaying
the same data that would (e.g. a frequency polygon, or maybe a density
plot)


The problem I have with geom = freqpoly is that it is not immediately
obvious to the casual reader of the figure that binned data has been
plotted.  The horizontal line at the top of each bar does make that
obvious.  Lots of solid black is an eye sore and using something
like fill=white helps to solve this problem (although this
currently appears red for me, probably some configuration issue to
sort out).

I'm not sure that a histogram using variable width bins and one log
scale has any meaningful interpretation; having both axis use a log
scale might make sense with variable width bins.


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).


Does there have to be a recognized statistical distribution to use R?


My point is about the display - if your binned counts look like 1,
100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
counts?


Many functions cannot handle log(0) so the safest thing to do is
remove 0s.  What about 1 and other values more than X orders of
magnitude less than the maximum?  This is an issue on any log scaled
plot and invariably they don't appear (and neither do the log(0)
cases).

Having a scale that gets closer to zero without ever getting there
is something that has to be accepted when displaying a log scale.

Logarithms are familiar to a technical readership and using them for
data spanning several orders of magnitude can highlight meaningful
relationships.  A non-technical readership is likely to completely
misunderstand a log scale and I have no idea how to display this
kind of data to such people.


I couldn't find that figure, but I'd think geom = freqpoly would be
more appropriate.  (I'd also suggest adding a bit more space between
the data and the margins in your figures - they overlap in many
plots).


My mistake, I as looking at a very old printed copy.  See figure 1234.1
These figures are from a previous book
www.knosof.co.uk/cbook
which used grap to draw all the graphs
www.lunabase.org/~faber/Vault/software/grap/
with the numbers being extracted and processed by various C programs and
awk scripts.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-30 Thread David Scott

 On 31/08/10 03:37, Derek M Jones wrote:

Hadley,


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen

That doesn't justify the use of a _histogram_  - and regardless of

The usage highlights meaningful characteristics of the data.
What better justification for any method of analysis and display is
there?


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).

Does there have to be a recognized statistical distribution to use R?
In my case I am using R for all of the analysis and graphics in a
new book.  This means that sometimes I have to deal with data sets
that are more or less a jumble of numbers with patterns in a few
places.  For instance, the numeric value of integer constants
appearing as one operand of the binary bitwise-AND operator (see
figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
at: www.knosof.co.uk/cbook/bandcons.hist.gz)

qplot(band, binwidth=8, geom=histogram) + scale_y_log()
does a good job of highlighting the peaks.


It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.

Doesn't being useful for my purpose make it meaningful, at least for me
and I hope my readers?

Hadley is correct about the problem of where to end the bars when trying 
to draw a log-histogram: basically you have to decide to cut them off 
somewhere. He is also right that a log-histogram is perhaps not a great 
graphic to use. However, they are used and indeed there is one in the 
Fieller, Flenley, Olbricht paper (published in Applied Statistics, now 
JRSS C) for example. I haven't searched for others, but certainly when I 
wrote a log-histogram routine it wasn't because I thought of doing such 
a plot all on my own.


A number of authors, including Barndorff-Nielsen in at least some of his 
papers (I haven't gone back and checked all his older work) just plot 
the midpoints of the tops of the log-histogram. (That is an option in 
logHist). Another approach is to fit an empirical density to the data 
and plot the log-density. That matches the advice often seen in this 
forum that plotting empirical density functions is preferable to drawing 
histograms. My feeling is that either of these two approaches is 
probably preferable to using log-histograms for the reasons Hadley 
enunciated. When plotting data plus a fitted curve, the midpoints 
approach does have the advantage of distinguishing data and theoretical 
curve more clearly.


Overall the idea of a plot with a logged y-axis is definitely a good one 
and its use is endemic in literature concerned with heavy-tailed 
distributions, particularly finance. The advantage is the clarity 
offered regarding tail behaviour, where for example exponential tails in 
the density correspond to straight lines in the logged y-axis plot.


Hope this helps.

David Scott


--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] log y 'axis' of histogram

2010-08-29 Thread Derek M Jones

All,

I have been trying to get calls to hist(...) to be plotted
with the y-axis having a log scale.

I have tried: par(ylog=TRUE)

I have also looked at the histogram package.

Suggestions welcome.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-29 Thread John Sorkin
How about computing the log of you variable and calling hist() on the log data. 
logy - log(y)
Hhist(logy)
John
John Sorkin
Chief Biostatistics and Informatics
Univ. of Maryland School of Medicine
Division of Gerontology and Geriatric Medicine
jsor...@grecc.umaryland.edu 
-Original Message-
From: Derek M Jones de...@knosof.co.uk
To:  r-help@r-project.org

Sent: 8/29/2010 9:58:35 PM
Subject: [R] log y 'axis' of histogram

All,

I have been trying to get calls to hist(...) to be plotted
with the y-axis having a log scale.

I have tried: par(ylog=TRUE)

I have also looked at the histogram package.

Suggestions welcome.

-- 
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] log y 'axis' of histogram

2010-08-29 Thread Joshua Wiley
Hi Derek,

Here is an option using the package ggplot2:

library(ggplot2)
x - sample(x = 10:50, size = 50, replace = TRUE)
qplot(x = x, geom = histogram) + scale_y_log()

However, the log scale is often inappropriate for histograms, because
the y-axis represents counts, which could potentially be 0, and
therefore undefined (R outputs -Inf).  Another option using base
graphics would be something along the lines (no pun intended) of:

temp - hist(x, plot = FALSE) #get histogram data
plot(x = temp$mids, y = log(temp$counts), type = h)

HTH,

Josh

On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones de...@knosof.co.uk wrote:
 All,

 I have been trying to get calls to hist(...) to be plotted
 with the y-axis having a log scale.

 I have tried: par(ylog=TRUE)

 I have also looked at the histogram package.

 Suggestions welcome.

 --
 Derek M. Jones                         tel: +44 (0) 1252 520 667
 Knowledge Software Ltd                 mailto:de...@knosof.co.uk
 Source code analysis                   http://www.knosof.co.uk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.