Re: [R] log y 'axis' of histogram
On 30/08/2010 1:58 p.m., Derek M Jones wrote: All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. You appear to be looking for a log-histogram function. There is one (logHist) in my package DistributionUtils on CRAN. You don't need the rest of the package to use it. You could just extract that particular function. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
It's not just that counts might be zero, but also that the base of each bar starts at zero. I really don't see how logging the y/axis of a histogram makes sense. Hadley On Sunday, August 29, 2010, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi Derek, Here is an option using the package ggplot2: library(ggplot2) x - sample(x = 10:50, size = 50, replace = TRUE) qplot(x = x, geom = histogram) + scale_y_log() However, the log scale is often inappropriate for histograms, because the y-axis represents counts, which could potentially be 0, and therefore undefined (R outputs -Inf). Another option using base graphics would be something along the lines (no pun intended) of: temp - hist(x, plot = FALSE) #get histogram data plot(x = temp$mids, y = log(temp$counts), type = h) HTH, Josh On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones de...@knosof.co.uk wrote: All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
Hadley, It's not just that counts might be zero, but also that the base of each bar starts at zero. I really don't see how logging the y/axis of a histogram makes sense. I have counts ranging over 4-6 orders of magnitude with peaks occurring at various 'magic' values. Using a log scale for the y-axis enables the smaller peaks, which would otherwise be almost invisible bumps along the x-axis, to be seen The references given for logHist in David Scott's DistributionUtils package are: Barndorff-Nielsen, O. (1977) Exponentially decreasing distributions for the logarithm of particle size, Proc. Roy. Soc. Lond., A353, 401–419. Barndorff-Nielsen, O. and Blæsild, P (1983). Hyperbolic distributions. In Encyclopedia of Statistical Sciences, eds., Johnson, N. L., Kotz, S. and Read, C. B., Vol. 3, pp. 700–707. New York: Wiley. Fieller, N. J., Flenley, E. C. and Olbricht, W. (1992) Statistics of particle size data. Appl. Statist., 41, 127–146. Hadley On Sunday, August 29, 2010, Joshua Wileyjwiley.ps...@gmail.com wrote: Hi Derek, Here is an option using the package ggplot2: library(ggplot2) x- sample(x = 10:50, size = 50, replace = TRUE) qplot(x = x, geom = histogram) + scale_y_log() However, the log scale is often inappropriate for histograms, because the y-axis represents counts, which could potentially be 0, and therefore undefined (R outputs -Inf). Another option using base graphics would be something along the lines (no pun intended) of: temp- hist(x, plot = FALSE) #get histogram data plot(x = temp$mids, y = log(temp$counts), type = h) HTH, Josh On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jonesde...@knosof.co.uk wrote: All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
I have counts ranging over 4-6 orders of magnitude with peaks occurring at various 'magic' values. Using a log scale for the y-axis enables the smaller peaks, which would otherwise be almost invisible bumps along the x-axis, to be seen That doesn't justify the use of a _histogram_ - and regardless of what distributional display you use, logging the counts imposes some pretty heavy restrictions on the shape of the distribution (e.g. that it must not drop to zero). It may be useful for your purposes, but that doesn't necessarily make it a meaningful graphic. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
Hadley, I have counts ranging over 4-6 orders of magnitude with peaks occurring at various 'magic' values. Using a log scale for the y-axis enables the smaller peaks, which would otherwise be almost invisible bumps along the x-axis, to be seen That doesn't justify the use of a _histogram_ - and regardless of The usage highlights meaningful characteristics of the data. What better justification for any method of analysis and display is there? what distributional display you use, logging the counts imposes some pretty heavy restrictions on the shape of the distribution (e.g. that it must not drop to zero). Does there have to be a recognized statistical distribution to use R? In my case I am using R for all of the analysis and graphics in a new book. This means that sometimes I have to deal with data sets that are more or less a jumble of numbers with patterns in a few places. For instance, the numeric value of integer constants appearing as one operand of the binary bitwise-AND operator (see figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data at: www.knosof.co.uk/cbook/bandcons.hist.gz) qplot(band, binwidth=8, geom=histogram) + scale_y_log() does a good job of highlighting the peaks. It may be useful for your purposes, but that doesn't necessarily make it a meaningful graphic. Doesn't being useful for my purpose make it meaningful, at least for me and I hope my readers? -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
That doesn't justify the use of a _histogram_ - and regardless of The usage highlights meaningful characteristics of the data. What better justification for any method of analysis and display is there? That you're displaying something that is mathematically well founded and meaningful - but my emphasis there was on histogram. I don't think a histogram makes sense, but there are other ways of displaying the same data that would (e.g. a frequency polygon, or maybe a density plot) what distributional display you use, logging the counts imposes some pretty heavy restrictions on the shape of the distribution (e.g. that it must not drop to zero). Does there have to be a recognized statistical distribution to use R? My point is about the display - if your binned counts look like 1, 100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log counts? In my case I am using R for all of the analysis and graphics in a new book. This means that sometimes I have to deal with data sets that are more or less a jumble of numbers with patterns in a few places. For instance, the numeric value of integer constants appearing as one operand of the binary bitwise-AND operator (see figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data at: www.knosof.co.uk/cbook/bandcons.hist.gz) qplot(band, binwidth=8, geom=histogram) + scale_y_log() does a good job of highlighting the peaks. I couldn't find that figure, but I'd think geom = freqpoly would be more appropriate. (I'd also suggest adding a bit more space between the data and the margins in your figures - they overlap in many plots). Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
Hadley, That you're displaying something that is mathematically well founded and meaningful - but my emphasis there was on histogram. I don't think a histogram makes sense, but there are other ways of displaying the same data that would (e.g. a frequency polygon, or maybe a density plot) The problem I have with geom = freqpoly is that it is not immediately obvious to the casual reader of the figure that binned data has been plotted. The horizontal line at the top of each bar does make that obvious. Lots of solid black is an eye sore and using something like fill=white helps to solve this problem (although this currently appears red for me, probably some configuration issue to sort out). I'm not sure that a histogram using variable width bins and one log scale has any meaningful interpretation; having both axis use a log scale might make sense with variable width bins. what distributional display you use, logging the counts imposes some pretty heavy restrictions on the shape of the distribution (e.g. that it must not drop to zero). Does there have to be a recognized statistical distribution to use R? My point is about the display - if your binned counts look like 1, 100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log counts? Many functions cannot handle log(0) so the safest thing to do is remove 0s. What about 1 and other values more than X orders of magnitude less than the maximum? This is an issue on any log scaled plot and invariably they don't appear (and neither do the log(0) cases). Having a scale that gets closer to zero without ever getting there is something that has to be accepted when displaying a log scale. Logarithms are familiar to a technical readership and using them for data spanning several orders of magnitude can highlight meaningful relationships. A non-technical readership is likely to completely misunderstand a log scale and I have no idea how to display this kind of data to such people. I couldn't find that figure, but I'd think geom = freqpoly would be more appropriate. (I'd also suggest adding a bit more space between the data and the margins in your figures - they overlap in many plots). My mistake, I as looking at a very old printed copy. See figure 1234.1 These figures are from a previous book www.knosof.co.uk/cbook which used grap to draw all the graphs www.lunabase.org/~faber/Vault/software/grap/ with the numbers being extracted and processed by various C programs and awk scripts. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
On 31/08/10 03:37, Derek M Jones wrote: Hadley, I have counts ranging over 4-6 orders of magnitude with peaks occurring at various 'magic' values. Using a log scale for the y-axis enables the smaller peaks, which would otherwise be almost invisible bumps along the x-axis, to be seen That doesn't justify the use of a _histogram_ - and regardless of The usage highlights meaningful characteristics of the data. What better justification for any method of analysis and display is there? what distributional display you use, logging the counts imposes some pretty heavy restrictions on the shape of the distribution (e.g. that it must not drop to zero). Does there have to be a recognized statistical distribution to use R? In my case I am using R for all of the analysis and graphics in a new book. This means that sometimes I have to deal with data sets that are more or less a jumble of numbers with patterns in a few places. For instance, the numeric value of integer constants appearing as one operand of the binary bitwise-AND operator (see figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data at: www.knosof.co.uk/cbook/bandcons.hist.gz) qplot(band, binwidth=8, geom=histogram) + scale_y_log() does a good job of highlighting the peaks. It may be useful for your purposes, but that doesn't necessarily make it a meaningful graphic. Doesn't being useful for my purpose make it meaningful, at least for me and I hope my readers? Hadley is correct about the problem of where to end the bars when trying to draw a log-histogram: basically you have to decide to cut them off somewhere. He is also right that a log-histogram is perhaps not a great graphic to use. However, they are used and indeed there is one in the Fieller, Flenley, Olbricht paper (published in Applied Statistics, now JRSS C) for example. I haven't searched for others, but certainly when I wrote a log-histogram routine it wasn't because I thought of doing such a plot all on my own. A number of authors, including Barndorff-Nielsen in at least some of his papers (I haven't gone back and checked all his older work) just plot the midpoints of the tops of the log-histogram. (That is an option in logHist). Another approach is to fit an empirical density to the data and plot the log-density. That matches the advice often seen in this forum that plotting empirical density functions is preferable to drawing histograms. My feeling is that either of these two approaches is probably preferable to using log-histograms for the reasons Hadley enunciated. When plotting data plus a fitted curve, the midpoints approach does have the advantage of distinguishing data and theoretical curve more clearly. Overall the idea of a plot with a logged y-axis is definitely a good one and its use is endemic in literature concerned with heavy-tailed distributions, particularly finance. The advantage is the clarity offered regarding tail behaviour, where for example exponential tails in the density correspond to straight lines in the logged y-axis plot. Hope this helps. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] log y 'axis' of histogram
All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
How about computing the log of you variable and calling hist() on the log data. logy - log(y) Hhist(logy) John John Sorkin Chief Biostatistics and Informatics Univ. of Maryland School of Medicine Division of Gerontology and Geriatric Medicine jsor...@grecc.umaryland.edu -Original Message- From: Derek M Jones de...@knosof.co.uk To: r-help@r-project.org Sent: 8/29/2010 9:58:35 PM Subject: [R] log y 'axis' of histogram All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log y 'axis' of histogram
Hi Derek, Here is an option using the package ggplot2: library(ggplot2) x - sample(x = 10:50, size = 50, replace = TRUE) qplot(x = x, geom = histogram) + scale_y_log() However, the log scale is often inappropriate for histograms, because the y-axis represents counts, which could potentially be 0, and therefore undefined (R outputs -Inf). Another option using base graphics would be something along the lines (no pun intended) of: temp - hist(x, plot = FALSE) #get histogram data plot(x = temp$mids, y = log(temp$counts), type = h) HTH, Josh On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones de...@knosof.co.uk wrote: All, I have been trying to get calls to hist(...) to be plotted with the y-axis having a log scale. I have tried: par(ylog=TRUE) I have also looked at the histogram package. Suggestions welcome. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:de...@knosof.co.uk Source code analysis http://www.knosof.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.