Hi all

I had studied the advantages of both libraries for a project and I concluded that basically itext is best for creation and modification of pdfs, while PDFBox highlighted in the extraction of information from these pdfs.

If DSpace is intended to add the ability to convert / download documents in different formats, I think it's worth checking out as it does Alfresco. It use openofice installed as a service, and other applications to convert documents
between formatson demand.

Regards
Adán Román Ruiz
ARVO Consultores

It might be worth it to take a peak at the old iText. I noticed in a StackOverflow post that it exists. I guess we could make a content disseminator interface. Subclass that to PDF-Citation-CoverPage, then have a PDFBOX implementation, and an iText (LGPL) implementation, and have spring wire it up. So, unfortunately DSpace 5 will have a disclaimer in the instructions that PDFBOX coverpage will not handle unicode characters. So, then for DSpace 6, a year from now. Will PDFBOX update to support unicode characters/fonts, or will LGPL iText be the only route forward. I did contact iText (AGPL) to see if they would create an exception for DSpace, but they declined, basically we'd have to pay per server that will use DSpace, umm. 2000+.

This is post-DSpace-5 work. But I'm thinking that if you think of the OAIS model, of SIP (Submission Package), AIP (Archival Package), DIP (Dissemination Package). That, with PDF Citation Cover page, we have the ability to generate a new DIP for content. (Similar to media filter I guess). So, if we wanted the ability to add Download As RTF, DOC, etc, that could be an extension to disseminator. Keeping the original PDF as "Archive / AIP".





________________
Peter Dietz
Longsight
www.longsight.com <http://www.longsight.com>
[email protected] <mailto:[email protected]>
p: 740-599-5005 x809

On Thu, Oct 30, 2014 at 7:41 AM, helix84 <[email protected] <mailto:[email protected]>> wrote:

    Hi Adán,

    we've already been using PDFbox in DSpace before this feature for PDF
    text extraction and thumbnail generation in filter-media and as a PDF
    packager, so this is a good choice. That said, we'll need to address
    the encoding issue mentioned and described in [1] - probably by
    upgrading when PDFbox 2 is released and uploaded to Maven Central, but
    if you can find another way, a patch for 5.x will be highly
    appreciated.

    [1] https://jira.duraspace.org/browse/DS-2224


    Regards,
    ~~helix84

    Compulsory reading: DSpace Mailing List Etiquette
    https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette




--
´


---
Este mensaje no contiene virus ni malware porque la protección de avast! 
Antivirus está activa.
http://www.avast.com
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to