Re: [R] PDF too large, PNG bad quality
On Fri, Oct 23, 2009 at 1:54 AM, Jim Lemon j...@bitwrit.com.au wrote: On 10/23/2009 06:07 AM, Lasse Kliemann wrote: I wish to save a scatter plot comprising approx. 2 million points in order to include it in a LaTeX document. Using 'pdf(...)' produces a file of size about 20 MB, which is useless. Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This is still too large. Not only that the document will be too large, but also PDF viewers choke on this. Moreover, Cairo has problems with text: by default text looks ugly, like scaled bitmaps. After hours of trying different settings, I discovered that choosing a different font family can help, e.g.: 'par(family=Mono)'. This gives good-looking text. Yet, the problem with the file size remains. There exists the hint to produdc EPS instead and then convert to PDF using 'epstopdf'. The resulting PDF files are slightly smaller, but still too large, and PDF viewers still don't like it. So I gave PNG a try. PNG files are much smaller and PDF viewers have no trouble with them. However, fonts look ugly. The same trick that worked for Cairo PDF has no effect for PNG. When I view the PNGs with a dedicated viewer like 'qiv', even the fonts look good. But not when included in LaTeX; I simply use '\includegraphics{...}' and run the document through 'pdflatex'. I tried both, creating PNG with 'png(...)' and converting from PDF to PNG using 'convert' from ImageMagick. So my questions are: - Is there a way to produce sufficiently lean PDFs directly in R, even when the plot comprises several million points? - How to produce a PNG that still looks nice when included in a LaTeX PDF document? Any hints will be greatly appreciated. Hi Lasse, I may be right off the track, but I can't make much sense of 2 million points in a scatterplot. If you are interested in the density of points within the plot, you could compute this using something like the bkde2 function in the KernSmooth package and then plot that using something like image. Which (and a bit more) are done by ?smoothScatter (originally in a Bioconductor package, but now available in a default installation of R). -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
Hexbin used to use base graphics, but has switched to grid. So you can either learn how to augment the grid graph, use lattice and learn how to augent there. Or you can get a fairly good base graphics approximation using the my.symbols function in the TeachingDemos package, e.g.: x - rnorm(1) y - rnorm(1,x,2) (bin - hexbin(x, y)) my.symbols( hcell2xy(bin), symb=ms.filled.polygon, n=6, add=FALSE, asp=...@shape, xlim=...@xbnds, ylim=...@ybnds, bg=grey( (6:0)/7 )[ cut(b...@count, 7) ], fg='#', inches=par('pin')[1]/b...@dimen[1]*1.25 ) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Lasse Kliemann Sent: Thursday, October 22, 2009 6:35 PM To: r-help@r-project.org Subject: Re: [R] PDF too large, PNG bad quality * Message by -Greg Snow- from Thu 2009-10-22: If you want to go the pdf route, then you need to find some way to reduce redundant information while still getting the main points of the plot. With so many point, I would suggest looking at the hexbin package (bioconductor I think) as one approach, it will not be an identical scatterplot, but will convey the information (possibly better) with much smaller graphics file sizes. There are other tools like sunflower plots or others, but hexbin has worked well for me. I took a look at the 'hexbin' package, and it is really interesting. You were right that it also helps to better display the data. Finally, this forced me to learn using the 'grid' package :-) I think I will use a pretty high number of bins, so the plot looks similar to the scatter plots I am used to -- with the addition of colors giving different densities. If you want to go the png route, the problem usually comes from scaling the plot after producing it. So, the solution is to create the plot at the exact size and at the exact resolution that you want to use it at in your document so that no scaling needs to be done. Use the png function, but don't accept the defaults, choose the size and resolution. If you later decide on a different size of graph, recreate the file, don't let LaTeX rescale the first one. This was my strategy so far. For instance, for a figure that is to span the whole text block from left to right: two_third_a4 - 8.3 * 2/3 png(new.png, width=two_third_a4, height=two_third_a4, units=in, res=300) plot(...) Earlier I wrote that the PNG looks good when displayed separately, but looks inferior when embedded in the LaTeX PDF document. However, I now believe that the dependence is more on the viewer application. It looks good displayed separately with 'qiv', but not with 'feh'. The PDF document looks inferior when displayed with 'evince' or 'epdfview', but it looks okay when displayed with 'xpdf'. I presume now that this phenomenon it not directly R-related. I thank you and everyone who responded so quickly. Lasse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
On 10/23/2009 06:07 AM, Lasse Kliemann wrote: I wish to save a scatter plot comprising approx. 2 million points in order to include it in a LaTeX document. Using 'pdf(...)' produces a file of size about 20 MB, which is useless. Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This is still too large. Not only that the document will be too large, but also PDF viewers choke on this. Moreover, Cairo has problems with text: by default text looks ugly, like scaled bitmaps. After hours of trying different settings, I discovered that choosing a different font family can help, e.g.: 'par(family=Mono)'. This gives good-looking text. Yet, the problem with the file size remains. There exists the hint to produdc EPS instead and then convert to PDF using 'epstopdf'. The resulting PDF files are slightly smaller, but still too large, and PDF viewers still don't like it. So I gave PNG a try. PNG files are much smaller and PDF viewers have no trouble with them. However, fonts look ugly. The same trick that worked for Cairo PDF has no effect for PNG. When I view the PNGs with a dedicated viewer like 'qiv', even the fonts look good. But not when included in LaTeX; I simply use '\includegraphics{...}' and run the document through 'pdflatex'. I tried both, creating PNG with 'png(...)' and converting from PDF to PNG using 'convert' from ImageMagick. So my questions are: - Is there a way to produce sufficiently lean PDFs directly in R, even when the plot comprises several million points? - How to produce a PNG that still looks nice when included in a LaTeX PDF document? Any hints will be greatly appreciated. Hi Lasse, I may be right off the track, but I can't make much sense of 2 million points in a scatterplot. If you are interested in the density of points within the plot, you could compute this using something like the bkde2 function in the KernSmooth package and then plot that using something like image. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
Hello Lasse, Why not try this? (1) Create 20MB PDF from R (2) use convert command in linux, examples below: convert -resize 50% 20mbfile.pdf smallerfile.pdf convert -resize 75% 20mbfile.pdf image.png Ghostscript can help you as well for conversion! Using vector formats (pdf,ps,eps) are good for this purpose, as opposed to scalar (png, jpg, etc.). Best, Parthiban. 2009/10/23 Jim Lemon j...@bitwrit.com.au On 10/23/2009 06:07 AM, Lasse Kliemann wrote: I wish to save a scatter plot comprising approx. 2 million points in order to include it in a LaTeX document. Using 'pdf(...)' produces a file of size about 20 MB, which is useless. Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This is still too large. Not only that the document will be too large, but also PDF viewers choke on this. Moreover, Cairo has problems with text: by default text looks ugly, like scaled bitmaps. After hours of trying different settings, I discovered that choosing a different font family can help, e.g.: 'par(family=Mono)'. This gives good-looking text. Yet, the problem with the file size remains. There exists the hint to produdc EPS instead and then convert to PDF using 'epstopdf'. The resulting PDF files are slightly smaller, but still too large, and PDF viewers still don't like it. So I gave PNG a try. PNG files are much smaller and PDF viewers have no trouble with them. However, fonts look ugly. The same trick that worked for Cairo PDF has no effect for PNG. When I view the PNGs with a dedicated viewer like 'qiv', even the fonts look good. But not when included in LaTeX; I simply use '\includegraphics{...}' and run the document through 'pdflatex'. I tried both, creating PNG with 'png(...)' and converting from PDF to PNG using 'convert' from ImageMagick. So my questions are: - Is there a way to produce sufficiently lean PDFs directly in R, even when the plot comprises several million points? - How to produce a PNG that still looks nice when included in a LaTeX PDF document? Any hints will be greatly appreciated. Hi Lasse, I may be right off the track, but I can't make much sense of 2 million points in a scatterplot. If you are interested in the density of points within the plot, you could compute this using something like the bkde2 function in the KernSmooth package and then plot that using something like image. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
Dear Lasse, This won't answer your specific questions and I apologize for that. AFAIK, pdf() produces uncompressed PDFs only. But you could use tools like pdftk to compress your PDFs. About the PNGs, you can always set the 'res' argument to improve resolution, but it won't beat the PDFs. I'm not sure the reader (of your LaTeX document) would be interested in seeing each of the 2M points of your scatter plot and, even if he was, I doubt he could. So, instead of plot(x, y), have you considered using: smoothScatter(x, y) ? Cheers, b On Oct 22, 2009, at 5:07 PM, Lasse Kliemann wrote: I wish to save a scatter plot comprising approx. 2 million points in order to include it in a LaTeX document. Using 'pdf(...)' produces a file of size about 20 MB, which is useless. Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This is still too large. Not only that the document will be too large, but also PDF viewers choke on this. Moreover, Cairo has problems with text: by default text looks ugly, like scaled bitmaps. After hours of trying different settings, I discovered that choosing a different font family can help, e.g.: 'par(family=Mono)'. This gives good-looking text. Yet, the problem with the file size remains. There exists the hint to produdc EPS instead and then convert to PDF using 'epstopdf'. The resulting PDF files are slightly smaller, but still too large, and PDF viewers still don't like it. So I gave PNG a try. PNG files are much smaller and PDF viewers have no trouble with them. However, fonts look ugly. The same trick that worked for Cairo PDF has no effect for PNG. When I view the PNGs with a dedicated viewer like 'qiv', even the fonts look good. But not when included in LaTeX; I simply use '\includegraphics{...}' and run the document through 'pdflatex'. I tried both, creating PNG with 'png(...)' and converting from PDF to PNG using 'convert' from ImageMagick. So my questions are: - Is there a way to produce sufficiently lean PDFs directly in R, even when the plot comprises several million points? - How to produce a PNG that still looks nice when included in a LaTeX PDF document? Any hints will be greatly appreciated. Thank you Lasse ATT1ATT2.txt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
The problem with the pdf files is that they are storing the information for every one of your points, even the ones that are overplotted by other points. The png file is smaller because it only stores information on which color each pixel should be, not how many points contributed to a particular pixel being a given color. But then png files convert the text to pixel information as well which don't look good if there is post scaling. If you want to go the pdf route, then you need to find some way to reduce redundant information while still getting the main points of the plot. With so many point, I would suggest looking at the hexbin package (bioconductor I think) as one approach, it will not be an identical scatterplot, but will convey the information (possibly better) with much smaller graphics file sizes. There are other tools like sunflower plots or others, but hexbin has worked well for me. If you want to go the png route, the problem usually comes from scaling the plot after producing it. So, the solution is to create the plot at the exact size and at the exact resolution that you want to use it at in your document so that no scaling needs to be done. Use the png function, but don't accept the defaults, choose the size and resolution. If you later decide on a different size of graph, recreate the file, don't let LaTeX rescale the first one. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Lasse Kliemann Sent: Thursday, October 22, 2009 1:07 PM To: r-help@r-project.org Subject: [R] PDF too large, PNG bad quality I wish to save a scatter plot comprising approx. 2 million points in order to include it in a LaTeX document. Using 'pdf(...)' produces a file of size about 20 MB, which is useless. Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This is still too large. Not only that the document will be too large, but also PDF viewers choke on this. Moreover, Cairo has problems with text: by default text looks ugly, like scaled bitmaps. After hours of trying different settings, I discovered that choosing a different font family can help, e.g.: 'par(family=Mono)'. This gives good-looking text. Yet, the problem with the file size remains. There exists the hint to produdc EPS instead and then convert to PDF using 'epstopdf'. The resulting PDF files are slightly smaller, but still too large, and PDF viewers still don't like it. So I gave PNG a try. PNG files are much smaller and PDF viewers have no trouble with them. However, fonts look ugly. The same trick that worked for Cairo PDF has no effect for PNG. When I view the PNGs with a dedicated viewer like 'qiv', even the fonts look good. But not when included in LaTeX; I simply use '\includegraphics{...}' and run the document through 'pdflatex'. I tried both, creating PNG with 'png(...)' and converting from PDF to PNG using 'convert' from ImageMagick. So my questions are: - Is there a way to produce sufficiently lean PDFs directly in R, even when the plot comprises several million points? - How to produce a PNG that still looks nice when included in a LaTeX PDF document? Any hints will be greatly appreciated. Thank you Lasse __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow greg.s...@imail.org wrote: The problem with the pdf files is that they are storing the information for every one of your points, even the ones that are overplotted by other points. The png file is smaller because it only stores information on which color each pixel should be, not how many points contributed to a particular pixel being a given color. But then png files convert the text to pixel information as well which don't look good if there is post scaling. If you want to go the pdf route, then you need to find some way to reduce redundant information while still getting the main points of the plot. With so many point, I would suggest looking at the hexbin package (bioconductor I think) as one approach, it will not be an identical scatterplot, but will convey the information (possibly better) with much smaller graphics file sizes. There are other tools like sunflower plots or others, but hexbin has worked well for me. I've seen this kind of thing happen after waiting an hour for one of my printouts when queued after something submitted by one of our extreme value stats people. I've seen them make plots containing maybe a million points, most of which are in a big black blob, but they want to be able to show the important sixty or so points at the extremes. I'm not sure what the best way to print this kind of thing is - if they know where the big blob is going to be then they could apply some cutoff to the plot and only show points outside the cutoff, and fill the region inside the cutoff with a black polygon... Another idea may be to do a high resolution plot as a PNG (think 300 pixels per inch of your desired final output) but do it without text and add that on later in a graphics package. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
For getting the details in the outer points, here is what I do. 1. use hexbin to create the big central blob (but with additional info). 2. use the chull function to find the outer points and save their indices in another vector 3. use chull on the rest of the points (excluding those found previously) and append their indices to the previous ones found. 4. repeat step 3 until have about 100-250 outer points (while loop works nicely) 5. use the points function to add just the outer points found above to the plot. This gives a plot with the color/shade representing the density where the most points are, but also shows the individual points out on the edges, the only thing that is missed are possibly interesting points laying between peaks. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On Behalf Of Barry Rowlingson Sent: Thursday, October 22, 2009 2:43 PM To: Greg Snow Cc: Lasse Kliemann; r-help@r-project.org Subject: Re: [R] PDF too large, PNG bad quality On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow greg.s...@imail.org wrote: The problem with the pdf files is that they are storing the information for every one of your points, even the ones that are overplotted by other points. The png file is smaller because it only stores information on which color each pixel should be, not how many points contributed to a particular pixel being a given color. But then png files convert the text to pixel information as well which don't look good if there is post scaling. If you want to go the pdf route, then you need to find some way to reduce redundant information while still getting the main points of the plot. With so many point, I would suggest looking at the hexbin package (bioconductor I think) as one approach, it will not be an identical scatterplot, but will convey the information (possibly better) with much smaller graphics file sizes. There are other tools like sunflower plots or others, but hexbin has worked well for me. I've seen this kind of thing happen after waiting an hour for one of my printouts when queued after something submitted by one of our extreme value stats people. I've seen them make plots containing maybe a million points, most of which are in a big black blob, but they want to be able to show the important sixty or so points at the extremes. I'm not sure what the best way to print this kind of thing is - if they know where the big blob is going to be then they could apply some cutoff to the plot and only show points outside the cutoff, and fill the region inside the cutoff with a black polygon... Another idea may be to do a high resolution plot as a PNG (think 300 pixels per inch of your desired final output) but do it without text and add that on later in a graphics package. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PDF too large, PNG bad quality
* Message by -Greg Snow- from Thu 2009-10-22: If you want to go the pdf route, then you need to find some way to reduce redundant information while still getting the main points of the plot. With so many point, I would suggest looking at the hexbin package (bioconductor I think) as one approach, it will not be an identical scatterplot, but will convey the information (possibly better) with much smaller graphics file sizes. There are other tools like sunflower plots or others, but hexbin has worked well for me. I took a look at the 'hexbin' package, and it is really interesting. You were right that it also helps to better display the data. Finally, this forced me to learn using the 'grid' package :-) I think I will use a pretty high number of bins, so the plot looks similar to the scatter plots I am used to -- with the addition of colors giving different densities. If you want to go the png route, the problem usually comes from scaling the plot after producing it. So, the solution is to create the plot at the exact size and at the exact resolution that you want to use it at in your document so that no scaling needs to be done. Use the png function, but don't accept the defaults, choose the size and resolution. If you later decide on a different size of graph, recreate the file, don't let LaTeX rescale the first one. This was my strategy so far. For instance, for a figure that is to span the whole text block from left to right: two_third_a4 - 8.3 * 2/3 png(new.png, width=two_third_a4, height=two_third_a4, units=in, res=300) plot(...) Earlier I wrote that the PNG looks good when displayed separately, but looks inferior when embedded in the LaTeX PDF document. However, I now believe that the dependence is more on the viewer application. It looks good displayed separately with 'qiv', but not with 'feh'. The PDF document looks inferior when displayed with 'evince' or 'epdfview', but it looks okay when displayed with 'xpdf'. I presume now that this phenomenon it not directly R-related. I thank you and everyone who responded so quickly. Lasse pgpA2cmkH4qfO.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.