Re: [cgiapp] RFC: CGI::Application::Plugin::Output::PDF

Jeff MacDonald Thu, 22 Sep 2005 23:48:44 -0700

Hi,

I have a question regarding PDF generation, independant of CGI::App,
so forgive me for temporarily hijacking the conversation.


Once in a while we're asked to output a pdf file.. and sometimes it's
important to know where the page breaks are. CSS allows you to tell
where to page break in the printed version of an HTML document which
is handy for files that you are generating statically.

What I'm wondering is, has anyone found any good solutions for placing
pagebreaks on a dynamic basis .. ie say a client has a list of members
and they want to print those member addresses out with a bounding
table around each to they look like little business cards. Of course i
would not want one of these little business cards to be cut in half by
a page boundry, and the cards can be of variable size depending upon
how much address info is provided... no i'm babbling..

For discussion.

Jeff.

On 9/22/05, Evan A. Zacks <[EMAIL PROTECTED]> wrote:
> Hello folks,
>
> I'm working on an output plugin that will convert html content to
> pdf. It handles setting the content-type and content-disposition
> headers before returning.
>
> You can see preliminary work at:
>
>   http://zacks.org/cgiapp/pdf/
>
> The pod is inline at the end of this message.
>
> Right now the actual conversion is done in a helper module. For
> the moment, only HTMLDoc (via HTML::HTMLDoc) is supported. I am
> planning support for PDF::FromHTML and html2ps/ps2pdf.
>
> Apologies for the long discussion below, but I'm looking for
> advice on how to best proceed with development.
>
> I am not sure of the best way to handle calling to the helper
> module to handle the conversion. For now, I have this code in
> place:
>
>   # $opts{converter} is HTMLDoc, for example
>
>   my $pack= "CGI::Application::Plugin::Output::PDF::$opts{converter}";
>   eval "require $pack";
>   croak "Can't load converter [$pack]" if $@;
>
>   return $pack->convert( ... );
>
> Is this filthy? One of the things I don't like about it is calling
> convert() as a class method -- it seems unnatural since (as of
> now) none of the modules are object-oriented. On the other hand,
> I believe 'no strict "refs"' would be necessary to call it as a
> function (with the package name as a variable).
>
> Another approach is similar but doesn't invoke the helper routine
> via the converter class:
>
>   # set $pack as above
>   my $convert= $pack->can('convert')
>     or croak "converter [$pack] doesn't know how to 'convert'";
>
>   return $convert->( \$html, $args->{converter_args} );
>
> Is one way better than the other? I feel like I'm missing
> something obvious and both approaches are poor.
>
>
> Another issue is how to handle configuration. Right now the user
> has the option to import the pdf_output() method, which should be
> called at the end of a runmode. This method will set the
> content-type header, convert html content to pdf, and return the
> pdf content.
>
> This method takes some optional named parameters which can select
> the converter to use and specify options specific to that
> converter. So a user importing the method can pass parameters for
> configuration purposes.
>
> If the user has CGI::Application version 4 or newer, and he does
> not request any symbols for import, pdf_output() is automatically
> installed as a postrun callback. This allows for transparent
> conversion from html to pdf:
>
>   use CGI::Application::Plugin::Output::PDF;
>
>   # ...
>
>   return $template->output; # sent to browser as pdf
>
>
> This is convenient, but it limits the user's ability to configure
> the behavior of the plugin. For example, the user doesn't have a
> way to specify which converter to use.
>
> One option would be to use arguments to import to configure the
> plugin. For example:
>
>   use CGI::Application::Plugin::Output::PDF converter => 'HTMLDoc';
>
> This can get a bit messy, however. It would also be nice for the
> user to be able to specify the output filename, or specify some
> parameters specific to the selected converter. I don't know how
> many options are too many to handle in the import() method.
>
> When calling pdf_output() directly, this is not a problem, as it
> takes an optional hash reference of named parameters to handle
> these configuration options, among others.
>
> What is the best practice for those who are using the transparent
> postrun callback?
>
> Thanks for reading and for any advice you may have.
>
> -E
>
>
> NAME
>     CGI::Application::Plugin::Output::PDF - Generate PDF output from a
>     CGI::Application runmode
>
> SYNOPSIS
>     For CGI::Application >= 4.0:
>
>       use CGI::Application::Plugin::Output::PDF;
>
>       # in some runmode...
>
>       # html content will be automatically converted to pdf
>       return $template->output;
>
>     For CGI::Application < 4.0:
>
>       use CGI::Application::Plugin::Output::PDF qw(pdf_output);
>
>       # in some runmode...
>
>       return $self->pdf_output( \$template->output );
>
> DESCRIPTION
>     "CGI::Application::Plugin::Output::PDF" provides a method, "pdf_output",
>     and a function, "html_to_pdf", to convert html content to pdf.
>
>     The "pdf_output" method may be called directly, or, for
>     CGI::Application(3) version 4 and above, a postrun callback will be
>     added to automatically, unless the user requests any symbols for export.
>
>     XXX should this be the case? or always add the callback?
>
> EXPORT
>     This module does not export any symbols by default. You may import the
>     "pdf_output" method and/or the "html_to_pdf" function on request:
>
>       use CGI::Application::Plugin::Output::PDF qw(pdf_output);
>
>     You may export both routines using the export tag ":all":
>
>       use CGI::Application::Plugin::Output::PDF qw(:all);
>
>     NOTE: For CGI::Application(3) version 4 and above, a postrun callback
>     will be added to automatically convert html content to pdf, unless the
>     user requests that any symbols be exported.
>
>     Subclasses of previous versions of CGI::Application(3) will need to
>     export the "pdf_output" method and call it directly:
>
>       return $self->pdf_output( \$template->output );
>
> METHODS
>     pdf_output
>           # in a runmode
>
>           # $template is an HTML::Template object, for example
>           my $html_output= $template->output;
>
>           return $self->pdf_output( \$html_output,
>             { filename  => 'download.pdf',
>               converter => 'HTMLDoc', }
>           );
>
>         This method generates a pdf file from html content and sends it
>         directly to the user's browser. It sets the content-type header to
>         'application/pdf' and sets the content-disposition header to
>         'attachment'.
>
>         It should be invoked through a CGI::Application(3) subclass object.
>
>         It takes two parameters. The first, which is required, is a
>         reference to a scalar containing the html content for conversion.
>         The second is a reference to a hash of named parameters, all of
>         which are optional:
>
>         converter
>                 The module to be used for converting html content to pdf.
>                 The current options are "HTMLDoc" (default), "HTML2PS", and
>                 "PDFFromHTML".
>
>                 See CONVERTERS below for further discussion of the merits of
>                 each.
>
>         filename
>                 The name of the file which will be sent in the HTTP
>                 content-disposition header. The default is "download.pdf".
>
> FUNCTIONS
>     html_to_pdf
>           my $pdf= html_to_pdf( \$html_content,
>             { filename  => 'download.pdf',
>               converter => 'HTMLDoc', }
>           );
>
>           # do something with $pdf
>
>         This function converts html content to pdf content and returns it.
>         It takes the same parameters as "pdf_output" (above), except that it
>         is a function, so it should not be invoked through an object.
>
>         In addition, the named parameter "filename" is ignored, as it is not
>         applicable to this function.
>
> CONVERTERS
>     NOTE: This section is incomplete.
>
>     In general, css is not well-supported.
>
>     In addition, It may be necessary to use full paths for images and links
>     in your html to get a close representation of your web page marked up as
>     pdf.
>
>     HTMLDoc
>         This converter uses the HTML::HTMLDoc(3) module.
>
>         From "http://www.htmldoc.org":
>
>           HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements,
>           and can generate title and table of contents pages. The 1.8.x
>           releases do not support stylesheets.
>
>         css/stylesheets
>                 Unsupported
>
>         paths   Under a web environment, had success passing
>                 "$ENV{DOCUMENT_ROOT" to HTML::HTMLDoc(3) object to fix
>                 relative image paths.
>
>     PDFFromHTML
>         This converter uses the PDF::FromHTML(3) module.
>
>         css/stylesheets
>                 PDF::FromHTML does not support css.
>
>         paths   XXX Unknown.
>
>     HTML2PS
>         This converter passes the html content to html2ps(1) and then to
>         ps2pdf(1).
>
>         Be aware that large table cells may not render as expected. From
>         "http://user.it.uu.se/~jan/html2psug.html":
>
>           Rendering HTML tables well is a non-trivial task. For
>           "real" tables, that is representation of tabular data,
>           html2ps usually generates reasonably good output. When
>           tables are used for layout purposes, the result varies
>           from good to useless. This is because a table cell is
>           never broken across pages. So if a table contains a cell
>           with a lot of content, the entire table may have to be
>           scaled down in size in order to make this cell fit on a
>           single page. Sometimes this may even result in unreadable
>           output.
>
>         css/stylesheets
>                 html2ps supports css to a limited extent, but the styles
>                 must be specified on the command line or in a configuration
>                 file.
>
>         paths   html2ps allows the user to specify either a root file path
>                 or a base URL to be used for relative paths in the html
>                 content.
>
> AUTHOR
>     Evan A. Zacks "<[EMAIL PROTECTED]>"
>
> SEE ALSO
>     PDF::FromHTML(3), HTML::HTMLDoc, html2ps(1), CGI::Application(3)
>
> COPYRIGHT & LICENSE
>     Copyright 2005 Evan A. Zacks, All rights reserved.
>
>     This program is free software; you can redistribute it and/or modify it
>     under the same terms as Perl itself.
>
> REVISION
>     $Id: PDF.pm 2 2005-09-22 06:57:17Z zackse $
>
>
> ---------------------------------------------------------------------
> Web Archive:  http://www.mail-archive.com/[email protected]/
>               http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Jeff MacDonald
http://www.halifaxbudolife.ca
http://www.nintai.ca

---------------------------------------------------------------------
Web Archive:  http://www.mail-archive.com/[email protected]/
              http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [cgiapp] RFC: CGI::Application::Plugin::Output::PDF

Reply via email to