Hi all
I had studied the advantages of both libraries for a project and I
concluded that basically itext is best for creation and modification of
pdfs, while PDFBox highlighted in the extraction of information from
these pdfs.
If DSpace is intended to add the ability to convert / download documents
in different formats, I think it's worth checking out as it does
Alfresco. It use openofice installed as a service, and other
applications to convert documents
between formatson demand.
Regards
Adán Román Ruiz
ARVO Consultores
It might be worth it to take a peak at the old iText. I noticed in a
StackOverflow post that it exists. I guess we could make a content
disseminator interface. Subclass that to PDF-Citation-CoverPage, then
have a PDFBOX implementation, and an iText (LGPL) implementation, and
have spring wire it up. So, unfortunately DSpace 5 will have a
disclaimer in the instructions that PDFBOX coverpage will not handle
unicode characters. So, then for DSpace 6, a year from now. Will
PDFBOX update to support unicode characters/fonts, or will LGPL iText
be the only route forward. I did contact iText (AGPL) to see if they
would create an exception for DSpace, but they declined, basically
we'd have to pay per server that will use DSpace, umm. 2000+.
This is post-DSpace-5 work. But I'm thinking that if you think of the
OAIS model, of SIP (Submission Package), AIP (Archival Package), DIP
(Dissemination Package). That, with PDF Citation Cover page, we have
the ability to generate a new DIP for content. (Similar to media
filter I guess). So, if we wanted the ability to add Download As RTF,
DOC, etc, that could be an extension to disseminator. Keeping the
original PDF as "Archive / AIP".
________________
Peter Dietz
Longsight
www.longsight.com <http://www.longsight.com>
[email protected] <mailto:[email protected]>
p: 740-599-5005 x809
On Thu, Oct 30, 2014 at 7:41 AM, helix84 <[email protected]
<mailto:[email protected]>> wrote:
Hi Adán,
we've already been using PDFbox in DSpace before this feature for PDF
text extraction and thumbnail generation in filter-media and as a PDF
packager, so this is a good choice. That said, we'll need to address
the encoding issue mentioned and described in [1] - probably by
upgrading when PDFbox 2 is released and uploaded to Maven Central, but
if you can find another way, a patch for 5.x will be highly
appreciated.
[1] https://jira.duraspace.org/browse/DS-2224
Regards,
~~helix84
Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
--
´
---
Este mensaje no contiene virus ni malware porque la protección de avast!
Antivirus está activa.
http://www.avast.com
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette