Re: [R] PDF too large, PNG bad quality

2009-10-30 Thread Deepayan Sarkar
On Fri, Oct 23, 2009 at 1:54 AM, Jim Lemon j...@bitwrit.com.au wrote:
 On 10/23/2009 06:07 AM, Lasse Kliemann wrote:

 I wish to save a scatter plot comprising approx. 2 million points
 in order to include it in a LaTeX document.

 Using 'pdf(...)' produces a file of size about 20 MB, which is
 useless.

 Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
 is still too large. Not only that the document will be too large,
 but also PDF viewers choke on this. Moreover, Cairo has problems
 with text: by default text looks ugly, like scaled bitmaps. After
 hours of trying different settings, I discovered that choosing a
 different font family can help, e.g.: 'par(family=Mono)'. This
 gives good-looking text. Yet, the problem with the file size
 remains.

 There exists the hint to produdc EPS instead and then convert to
 PDF using 'epstopdf'. The resulting PDF files are slightly
 smaller, but still too large, and PDF viewers still don't like
 it.

 So I gave PNG a try. PNG files are much smaller and PDF viewers
 have no trouble with them. However, fonts look ugly. The same
 trick that worked for Cairo PDF has no effect for PNG. When I
 view the PNGs with a dedicated viewer like 'qiv', even the fonts
 look good. But not when included in LaTeX; I simply use
 '\includegraphics{...}' and run the document through 'pdflatex'.

 I tried both, creating PNG with 'png(...)' and converting from
 PDF to PNG using 'convert' from ImageMagick.

 So my questions are:

 - Is there a way to produce sufficiently lean PDFs directly in R,
   even when the plot comprises several million points?

 - How to produce a PNG that still looks nice when included in a
   LaTeX PDF document?

 Any hints will be greatly appreciated.


 Hi Lasse,
 I may be right off the track, but I can't make much sense of 2 million
 points in a scatterplot. If you are interested in the density of points
 within the plot, you could compute this using something like the bkde2
 function in the KernSmooth package and then plot that using something like
 image.

Which (and a bit more) are done by ?smoothScatter (originally in a
Bioconductor package, but now available in a default installation of
R).

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-26 Thread Greg Snow
Hexbin used to use base graphics, but has switched to grid.  So you can either 
learn how to augment the grid graph, use lattice and learn how to augent there. 
 Or you can get a fairly good base graphics approximation using the my.symbols 
function in the TeachingDemos package, e.g.:

x - rnorm(1)
y - rnorm(1,x,2)
(bin - hexbin(x, y))

my.symbols( hcell2xy(bin), symb=ms.filled.polygon, n=6, add=FALSE, 
asp=...@shape, xlim=...@xbnds, ylim=...@ybnds, 
bg=grey( (6:0)/7 )[ cut(b...@count, 7) ], fg='#',
inches=par('pin')[1]/b...@dimen[1]*1.25 )




-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Lasse Kliemann
 Sent: Thursday, October 22, 2009 6:35 PM
 To: r-help@r-project.org
 Subject: Re: [R] PDF too large, PNG bad quality
 
 * Message by -Greg Snow- from Thu 2009-10-22:
 
  If you want to go the pdf route, then you need to find some way
  to reduce redundant information while still getting the main
  points of the plot.  With so many point, I would suggest
  looking at the hexbin package (bioconductor I think) as one
  approach, it will not be an identical scatterplot, but will
  convey the information (possibly better) with much smaller
  graphics file sizes.  There are other tools like sunflower
  plots or others, but hexbin has worked well for me.
 
 I took a look at the 'hexbin' package, and it is really
 interesting. You were right that it also helps to better display
 the data. Finally, this forced me to learn using the 'grid'
 package :-) I think I will use a pretty high number of bins, so
 the plot looks similar to the scatter plots I am used to -- with
 the addition of colors giving different densities.
 
  If you want to go the png route, the problem usually comes from
  scaling the plot after producing it.  So, the solution is to
  create the plot at the exact size and at the exact resolution
  that you want to use it at in your document so that no scaling
  needs to be done.  Use the png function, but don't accept the
  defaults, choose the size and resolution.  If you later decide
  on a different size of graph, recreate the file, don't let
  LaTeX rescale the first one.
 
 This was my strategy so far. For instance, for a figure that is
 to span the whole text block from left to right:
 
 two_third_a4 - 8.3 * 2/3
 png(new.png,
 width=two_third_a4,
 height=two_third_a4,
 units=in,
 res=300)
 plot(...)
 
 Earlier I wrote that the PNG looks good when displayed
 separately, but looks inferior when embedded in the LaTeX PDF
 document. However, I now believe that the dependence is more on
 the viewer application. It looks good displayed separately with
 'qiv', but not with 'feh'. The PDF document looks inferior when
 displayed with 'evince' or 'epdfview', but it looks okay when
 displayed with 'xpdf'. I presume now that this phenomenon it not
 directly R-related.
 
 I thank you and everyone who responded so quickly.
 
 Lasse

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-23 Thread Jim Lemon

On 10/23/2009 06:07 AM, Lasse Kliemann wrote:

I wish to save a scatter plot comprising approx. 2 million points
in order to include it in a LaTeX document.

Using 'pdf(...)' produces a file of size about 20 MB, which is
useless.

Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
is still too large. Not only that the document will be too large,
but also PDF viewers choke on this. Moreover, Cairo has problems
with text: by default text looks ugly, like scaled bitmaps. After
hours of trying different settings, I discovered that choosing a
different font family can help, e.g.: 'par(family=Mono)'. This
gives good-looking text. Yet, the problem with the file size
remains.

There exists the hint to produdc EPS instead and then convert to
PDF using 'epstopdf'. The resulting PDF files are slightly
smaller, but still too large, and PDF viewers still don't like
it.

So I gave PNG a try. PNG files are much smaller and PDF viewers
have no trouble with them. However, fonts look ugly. The same
trick that worked for Cairo PDF has no effect for PNG. When I
view the PNGs with a dedicated viewer like 'qiv', even the fonts
look good. But not when included in LaTeX; I simply use
'\includegraphics{...}' and run the document through 'pdflatex'.

I tried both, creating PNG with 'png(...)' and converting from
PDF to PNG using 'convert' from ImageMagick.

So my questions are:

- Is there a way to produce sufficiently lean PDFs directly in R,
   even when the plot comprises several million points?

- How to produce a PNG that still looks nice when included in a
   LaTeX PDF document?

Any hints will be greatly appreciated.
   

Hi Lasse,
I may be right off the track, but I can't make much sense of 2 million 
points in a scatterplot. If you are interested in the density of points 
within the plot, you could compute this using something like the bkde2 
function in the KernSmooth package and then plot that using something 
like image.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-23 Thread Vijaya Parthiban
Hello Lasse,

Why not try this?

(1) Create 20MB PDF from R
(2) use convert command in linux, examples below:

convert -resize 50% 20mbfile.pdf smallerfile.pdf
convert -resize 75% 20mbfile.pdf image.png

Ghostscript can help you as well for conversion! Using vector formats
(pdf,ps,eps) are good for this purpose, as opposed to scalar (png, jpg,
etc.).

Best,
Parthiban.


2009/10/23 Jim Lemon j...@bitwrit.com.au

 On 10/23/2009 06:07 AM, Lasse Kliemann wrote:

 I wish to save a scatter plot comprising approx. 2 million points
 in order to include it in a LaTeX document.

 Using 'pdf(...)' produces a file of size about 20 MB, which is
 useless.

 Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
 is still too large. Not only that the document will be too large,
 but also PDF viewers choke on this. Moreover, Cairo has problems
 with text: by default text looks ugly, like scaled bitmaps. After
 hours of trying different settings, I discovered that choosing a
 different font family can help, e.g.: 'par(family=Mono)'. This
 gives good-looking text. Yet, the problem with the file size
 remains.

 There exists the hint to produdc EPS instead and then convert to
 PDF using 'epstopdf'. The resulting PDF files are slightly
 smaller, but still too large, and PDF viewers still don't like
 it.

 So I gave PNG a try. PNG files are much smaller and PDF viewers
 have no trouble with them. However, fonts look ugly. The same
 trick that worked for Cairo PDF has no effect for PNG. When I
 view the PNGs with a dedicated viewer like 'qiv', even the fonts
 look good. But not when included in LaTeX; I simply use
 '\includegraphics{...}' and run the document through 'pdflatex'.

 I tried both, creating PNG with 'png(...)' and converting from
 PDF to PNG using 'convert' from ImageMagick.

 So my questions are:

 - Is there a way to produce sufficiently lean PDFs directly in R,
   even when the plot comprises several million points?

 - How to produce a PNG that still looks nice when included in a
   LaTeX PDF document?

 Any hints will be greatly appreciated.


 Hi Lasse,
 I may be right off the track, but I can't make much sense of 2 million
 points in a scatterplot. If you are interested in the density of points
 within the plot, you could compute this using something like the bkde2
 function in the KernSmooth package and then plot that using something like
 image.

 Jim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Benilton Carvalho

Dear Lasse,

This won't answer your specific questions and I apologize for that.  
AFAIK, pdf() produces uncompressed PDFs only. But you could use tools  
like pdftk to compress your PDFs. About the PNGs, you can always set  
the 'res' argument to improve resolution, but it won't beat the PDFs.


I'm not sure the reader (of your LaTeX document) would be interested  
in seeing each of the 2M points of your scatter plot and, even if he  
was, I doubt he could. So, instead of plot(x, y), have you considered  
using:


smoothScatter(x, y)

?

Cheers,

b

On Oct 22, 2009, at 5:07 PM, Lasse Kliemann wrote:


I wish to save a scatter plot comprising approx. 2 million points
in order to include it in a LaTeX document.

Using 'pdf(...)' produces a file of size about 20 MB, which is
useless.

Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
is still too large. Not only that the document will be too large,
but also PDF viewers choke on this. Moreover, Cairo has problems
with text: by default text looks ugly, like scaled bitmaps. After
hours of trying different settings, I discovered that choosing a
different font family can help, e.g.: 'par(family=Mono)'. This
gives good-looking text. Yet, the problem with the file size
remains.

There exists the hint to produdc EPS instead and then convert to
PDF using 'epstopdf'. The resulting PDF files are slightly
smaller, but still too large, and PDF viewers still don't like
it.

So I gave PNG a try. PNG files are much smaller and PDF viewers
have no trouble with them. However, fonts look ugly. The same
trick that worked for Cairo PDF has no effect for PNG. When I
view the PNGs with a dedicated viewer like 'qiv', even the fonts
look good. But not when included in LaTeX; I simply use
'\includegraphics{...}' and run the document through 'pdflatex'.

I tried both, creating PNG with 'png(...)' and converting from
PDF to PNG using 'convert' from ImageMagick.

So my questions are:

- Is there a way to produce sufficiently lean PDFs directly in R,
 even when the plot comprises several million points?

- How to produce a PNG that still looks nice when included in a
 LaTeX PDF document?

Any hints will be greatly appreciated.

Thank you
Lasse
ATT1ATT2.txt


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Greg Snow
The problem with the pdf files is that they are storing the information for 
every one of your points, even the ones that are overplotted by other points.  
The png file is smaller because it only stores information on which color each 
pixel should be, not how many points contributed to a particular pixel being a 
given color.  But then png files convert the text to pixel information as well 
which don't look good if there is post scaling.

If you want to go the pdf route, then you need to find some way to reduce 
redundant information while still getting the main points of the plot.  With so 
many point, I would suggest looking at the hexbin package (bioconductor I 
think) as one approach, it will not be an identical scatterplot, but will 
convey the information (possibly better) with much smaller graphics file sizes. 
 There are other tools like sunflower plots or others, but hexbin has worked 
well for me.

If you want to go the png route, the problem usually comes from scaling the 
plot after producing it.  So, the solution is to create the plot at the exact 
size and at the exact resolution that you want to use it at in your document so 
that no scaling needs to be done.  Use the png function, but don't accept the 
defaults, choose the size and resolution.  If you later decide on a different 
size of graph, recreate the file, don't let LaTeX rescale the first one.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Lasse Kliemann
 Sent: Thursday, October 22, 2009 1:07 PM
 To: r-help@r-project.org
 Subject: [R] PDF too large, PNG bad quality
 
 I wish to save a scatter plot comprising approx. 2 million points
 in order to include it in a LaTeX document.
 
 Using 'pdf(...)' produces a file of size about 20 MB, which is
 useless.
 
 Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
 is still too large. Not only that the document will be too large,
 but also PDF viewers choke on this. Moreover, Cairo has problems
 with text: by default text looks ugly, like scaled bitmaps. After
 hours of trying different settings, I discovered that choosing a
 different font family can help, e.g.: 'par(family=Mono)'. This
 gives good-looking text. Yet, the problem with the file size
 remains.
 
 There exists the hint to produdc EPS instead and then convert to
 PDF using 'epstopdf'. The resulting PDF files are slightly
 smaller, but still too large, and PDF viewers still don't like
 it.
 
 So I gave PNG a try. PNG files are much smaller and PDF viewers
 have no trouble with them. However, fonts look ugly. The same
 trick that worked for Cairo PDF has no effect for PNG. When I
 view the PNGs with a dedicated viewer like 'qiv', even the fonts
 look good. But not when included in LaTeX; I simply use
 '\includegraphics{...}' and run the document through 'pdflatex'.
 
 I tried both, creating PNG with 'png(...)' and converting from
 PDF to PNG using 'convert' from ImageMagick.
 
 So my questions are:
 
 - Is there a way to produce sufficiently lean PDFs directly in R,
   even when the plot comprises several million points?
 
 - How to produce a PNG that still looks nice when included in a
   LaTeX PDF document?
 
 Any hints will be greatly appreciated.
 
 Thank you
 Lasse

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Barry Rowlingson
On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow greg.s...@imail.org wrote:
 The problem with the pdf files is that they are storing the information for 
 every one of your points, even the ones that are overplotted by other points. 
  The png file is smaller because it only stores information on which color 
 each pixel should be, not how many points contributed to a particular pixel 
 being a given color.  But then png files convert the text to pixel 
 information as well which don't look good if there is post scaling.

 If you want to go the pdf route, then you need to find some way to reduce 
 redundant information while still getting the main points of the plot.  With 
 so many point, I would suggest looking at the hexbin package (bioconductor I 
 think) as one approach, it will not be an identical scatterplot, but will 
 convey the information (possibly better) with much smaller graphics file 
 sizes.  There are other tools like sunflower plots or others, but hexbin has 
 worked well for me.


 I've seen this kind of thing happen after waiting an hour for one of
my printouts when queued after something submitted by one of our
extreme value stats people. I've seen them make plots containing maybe
a million points, most of which are in a big black blob, but they want
to be able to show the important sixty or so points at the extremes.

 I'm not sure what the best way to print this kind of thing is - if
they know where the big blob is going to be then they could apply some
cutoff to the plot and only show points outside the cutoff, and fill
the region inside the cutoff with a black polygon...

 Another idea may be to do a high resolution plot as a PNG (think 300
pixels per inch of your desired final output) but do it without text
and add that on later in a graphics package.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Greg Snow
For getting the details in the outer points, here is what I do.

1. use hexbin to create the big central blob (but with additional info).  
2. use the chull function to find the outer points and save their indices in 
another vector
3. use chull on the rest of the points (excluding those found previously) and 
append their indices to the previous ones found.
4. repeat step 3 until have about 100-250 outer points (while loop works nicely)
5. use the points function to add just the outer points found above to the plot.

This gives a plot with the color/shade representing the density where the most 
points are, but also shows the individual points out on the edges, the only 
thing that is missed are possibly interesting points laying between peaks.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com]
 On Behalf Of Barry Rowlingson
 Sent: Thursday, October 22, 2009 2:43 PM
 To: Greg Snow
 Cc: Lasse Kliemann; r-help@r-project.org
 Subject: Re: [R] PDF too large, PNG bad quality
 
 On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow greg.s...@imail.org wrote:
  The problem with the pdf files is that they are storing the
 information for every one of your points, even the ones that are
 overplotted by other points.  The png file is smaller because it only
 stores information on which color each pixel should be, not how many
 points contributed to a particular pixel being a given color.  But then
 png files convert the text to pixel information as well which don't
 look good if there is post scaling.
 
  If you want to go the pdf route, then you need to find some way to
 reduce redundant information while still getting the main points of the
 plot.  With so many point, I would suggest looking at the hexbin
 package (bioconductor I think) as one approach, it will not be an
 identical scatterplot, but will convey the information (possibly
 better) with much smaller graphics file sizes.  There are other tools
 like sunflower plots or others, but hexbin has worked well for me.
 
 
  I've seen this kind of thing happen after waiting an hour for one of
 my printouts when queued after something submitted by one of our
 extreme value stats people. I've seen them make plots containing maybe
 a million points, most of which are in a big black blob, but they want
 to be able to show the important sixty or so points at the extremes.
 
  I'm not sure what the best way to print this kind of thing is - if
 they know where the big blob is going to be then they could apply some
 cutoff to the plot and only show points outside the cutoff, and fill
 the region inside the cutoff with a black polygon...
 
  Another idea may be to do a high resolution plot as a PNG (think 300
 pixels per inch of your desired final output) but do it without text
 and add that on later in a graphics package.
 
 Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Lasse Kliemann
* Message by -Greg Snow- from Thu 2009-10-22:
 
 If you want to go the pdf route, then you need to find some way 
 to reduce redundant information while still getting the main 
 points of the plot.  With so many point, I would suggest 
 looking at the hexbin package (bioconductor I think) as one 
 approach, it will not be an identical scatterplot, but will 
 convey the information (possibly better) with much smaller 
 graphics file sizes.  There are other tools like sunflower 
 plots or others, but hexbin has worked well for me.
 
I took a look at the 'hexbin' package, and it is really 
interesting. You were right that it also helps to better display 
the data. Finally, this forced me to learn using the 'grid' 
package :-) I think I will use a pretty high number of bins, so 
the plot looks similar to the scatter plots I am used to -- with 
the addition of colors giving different densities.

 If you want to go the png route, the problem usually comes from 
 scaling the plot after producing it.  So, the solution is to 
 create the plot at the exact size and at the exact resolution 
 that you want to use it at in your document so that no scaling 
 needs to be done.  Use the png function, but don't accept the 
 defaults, choose the size and resolution.  If you later decide 
 on a different size of graph, recreate the file, don't let 
 LaTeX rescale the first one.

This was my strategy so far. For instance, for a figure that is 
to span the whole text block from left to right:

two_third_a4 - 8.3 * 2/3
png(new.png,
width=two_third_a4,
height=two_third_a4,
units=in,
res=300)
plot(...)

Earlier I wrote that the PNG looks good when displayed 
separately, but looks inferior when embedded in the LaTeX PDF 
document. However, I now believe that the dependence is more on 
the viewer application. It looks good displayed separately with 
'qiv', but not with 'feh'. The PDF document looks inferior when 
displayed with 'evince' or 'epdfview', but it looks okay when 
displayed with 'xpdf'. I presume now that this phenomenon it not 
directly R-related.

I thank you and everyone who responded so quickly.

Lasse


pgpA2cmkH4qfO.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.