[cgiapp] RFC: CGI::Application::Plugin::Output::PDF

Evan A. Zacks Thu, 22 Sep 2005 12:16:11 -0700

Hello folks,

I'm working on an output plugin that will convert html content to
pdf. It handles setting the content-type and content-disposition
headers before returning.


You can see preliminary work at:

  http://zacks.org/cgiapp/pdf/

The pod is inline at the end of this message.

Right now the actual conversion is done in a helper module. For
the moment, only HTMLDoc (via HTML::HTMLDoc) is supported. I am
planning support for PDF::FromHTML and html2ps/ps2pdf.

Apologies for the long discussion below, but I'm looking for
advice on how to best proceed with development.

I am not sure of the best way to handle calling to the helper
module to handle the conversion. For now, I have this code in
place:

  # $opts{converter} is HTMLDoc, for example

  my $pack= "CGI::Application::Plugin::Output::PDF::$opts{converter}";
  eval "require $pack";
  croak "Can't load converter [$pack]" if $@;

  return $pack->convert( ... );

Is this filthy? One of the things I don't like about it is calling
convert() as a class method -- it seems unnatural since (as of
now) none of the modules are object-oriented. On the other hand,
I believe 'no strict "refs"' would be necessary to call it as a
function (with the package name as a variable).

Another approach is similar but doesn't invoke the helper routine
via the converter class:

  # set $pack as above
  my $convert= $pack->can('convert')
    or croak "converter [$pack] doesn't know how to 'convert'";

  return $convert->( \$html, $args->{converter_args} );

Is one way better than the other? I feel like I'm missing
something obvious and both approaches are poor.


Another issue is how to handle configuration. Right now the user
has the option to import the pdf_output() method, which should be
called at the end of a runmode. This method will set the
content-type header, convert html content to pdf, and return the
pdf content.

This method takes some optional named parameters which can select
the converter to use and specify options specific to that
converter. So a user importing the method can pass parameters for
configuration purposes.

If the user has CGI::Application version 4 or newer, and he does
not request any symbols for import, pdf_output() is automatically
installed as a postrun callback. This allows for transparent
conversion from html to pdf:

  use CGI::Application::Plugin::Output::PDF;

  # ...

  return $template->output; # sent to browser as pdf


This is convenient, but it limits the user's ability to configure
the behavior of the plugin. For example, the user doesn't have a
way to specify which converter to use.

One option would be to use arguments to import to configure the
plugin. For example:

  use CGI::Application::Plugin::Output::PDF converter => 'HTMLDoc';

This can get a bit messy, however. It would also be nice for the
user to be able to specify the output filename, or specify some
parameters specific to the selected converter. I don't know how
many options are too many to handle in the import() method.

When calling pdf_output() directly, this is not a problem, as it
takes an optional hash reference of named parameters to handle
these configuration options, among others.

What is the best practice for those who are using the transparent
postrun callback?

Thanks for reading and for any advice you may have.

-E


NAME
    CGI::Application::Plugin::Output::PDF - Generate PDF output from a
    CGI::Application runmode

SYNOPSIS
    For CGI::Application >= 4.0:

      use CGI::Application::Plugin::Output::PDF;

      # in some runmode...

      # html content will be automatically converted to pdf
      return $template->output;

    For CGI::Application < 4.0:

      use CGI::Application::Plugin::Output::PDF qw(pdf_output);

      # in some runmode...

      return $self->pdf_output( \$template->output );

DESCRIPTION
    "CGI::Application::Plugin::Output::PDF" provides a method, "pdf_output",
    and a function, "html_to_pdf", to convert html content to pdf.

    The "pdf_output" method may be called directly, or, for
    CGI::Application(3) version 4 and above, a postrun callback will be
    added to automatically, unless the user requests any symbols for export.

    XXX should this be the case? or always add the callback?

EXPORT
    This module does not export any symbols by default. You may import the
    "pdf_output" method and/or the "html_to_pdf" function on request:

      use CGI::Application::Plugin::Output::PDF qw(pdf_output);

    You may export both routines using the export tag ":all":

      use CGI::Application::Plugin::Output::PDF qw(:all);

    NOTE: For CGI::Application(3) version 4 and above, a postrun callback
    will be added to automatically convert html content to pdf, unless the
    user requests that any symbols be exported.

    Subclasses of previous versions of CGI::Application(3) will need to
    export the "pdf_output" method and call it directly:

      return $self->pdf_output( \$template->output );

METHODS
    pdf_output
          # in a runmode

          # $template is an HTML::Template object, for example
          my $html_output= $template->output;

          return $self->pdf_output( \$html_output,
            { filename  => 'download.pdf',
              converter => 'HTMLDoc', }
          );

        This method generates a pdf file from html content and sends it
        directly to the user's browser. It sets the content-type header to
        'application/pdf' and sets the content-disposition header to
        'attachment'.

        It should be invoked through a CGI::Application(3) subclass object.

        It takes two parameters. The first, which is required, is a
        reference to a scalar containing the html content for conversion.
        The second is a reference to a hash of named parameters, all of
        which are optional:

        converter
                The module to be used for converting html content to pdf.
                The current options are "HTMLDoc" (default), "HTML2PS", and
                "PDFFromHTML".

                See CONVERTERS below for further discussion of the merits of
                each.

        filename
                The name of the file which will be sent in the HTTP
                content-disposition header. The default is "download.pdf".

FUNCTIONS
    html_to_pdf
          my $pdf= html_to_pdf( \$html_content,
            { filename  => 'download.pdf',
              converter => 'HTMLDoc', }
          );

          # do something with $pdf

        This function converts html content to pdf content and returns it.
        It takes the same parameters as "pdf_output" (above), except that it
        is a function, so it should not be invoked through an object.

        In addition, the named parameter "filename" is ignored, as it is not
        applicable to this function.

CONVERTERS
    NOTE: This section is incomplete.

    In general, css is not well-supported.

    In addition, It may be necessary to use full paths for images and links
    in your html to get a close representation of your web page marked up as
    pdf.

    HTMLDoc
        This converter uses the HTML::HTMLDoc(3) module.

        From "http://www.htmldoc.org":

          HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements,
          and can generate title and table of contents pages. The 1.8.x
          releases do not support stylesheets.

        css/stylesheets
                Unsupported

        paths   Under a web environment, had success passing
                "$ENV{DOCUMENT_ROOT" to HTML::HTMLDoc(3) object to fix
                relative image paths.

    PDFFromHTML
        This converter uses the PDF::FromHTML(3) module.

        css/stylesheets
                PDF::FromHTML does not support css.

        paths   XXX Unknown.

    HTML2PS
        This converter passes the html content to html2ps(1) and then to
        ps2pdf(1).

        Be aware that large table cells may not render as expected. From
        "http://user.it.uu.se/~jan/html2psug.html":

          Rendering HTML tables well is a non-trivial task. For
          "real" tables, that is representation of tabular data,
          html2ps usually generates reasonably good output. When
          tables are used for layout purposes, the result varies
          from good to useless. This is because a table cell is
          never broken across pages. So if a table contains a cell
          with a lot of content, the entire table may have to be
          scaled down in size in order to make this cell fit on a
          single page. Sometimes this may even result in unreadable
          output.

        css/stylesheets
                html2ps supports css to a limited extent, but the styles
                must be specified on the command line or in a configuration
                file.

        paths   html2ps allows the user to specify either a root file path
                or a base URL to be used for relative paths in the html
                content.

AUTHOR
    Evan A. Zacks "<[EMAIL PROTECTED]>"

SEE ALSO
    PDF::FromHTML(3), HTML::HTMLDoc, html2ps(1), CGI::Application(3)

COPYRIGHT & LICENSE
    Copyright 2005 Evan A. Zacks, All rights reserved.

    This program is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

REVISION
    $Id: PDF.pm 2 2005-09-22 06:57:17Z zackse $


---------------------------------------------------------------------
Web Archive:  http://www.mail-archive.com/[email protected]/
              http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[cgiapp] RFC: CGI::Application::Plugin::Output::PDF

Reply via email to