Amit Maheshwari created PDFBOX-4642:
---------------------------------------

             Summary: I'd like to know about the dependencies of PDF Box 
(2.0.12.0) 
                 Key: PDFBOX-4642
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4642
             Project: PDFBox
          Issue Type: Wish
          Components: Text extraction
    Affects Versions: 2.0.12
            Reporter: Amit Maheshwari
         Attachments: PDFBox.NET-1.8.9.zip

We have built a .Net version of PdfBox 2.0.12.0 using IKVM and we are using it 
to extract Text and Form Fields.

Currently we have taken following dependencies

BCProv.JDK15on
Commons.Logging
Commons.Logging.Javadoc
DiffUtils
Fontbox
HamcREST.Core
IKVM.OpenJDK.Core
IKVM.OpenJDK.Security
IKVM.OpenJDK.SwingAWT
IKVM.OpenJDK.Text
IKVM.OpenJDK.Util
IKVM.Reflection
IKVM.Runtime
jcl-over-slf4j-1.7.6

 

While recently we have faced an issue while extracting the text out of a pdf 
(see below stack trace)

System.IO.FileNotFoundException: Could not load file or assembly 
'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, 
PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot 
find the file specified.

File name: 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, 
PublicKeyToken=13235d27fcbfff58'

at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(InputStream , OutputStream , 
Int32 )

at org.apache.pdfbox.filter.LZWFilter.decode(InputStream encoded, OutputStream 
decoded, COSDictionary parameters, Int32 index)

at org.apache.pdfbox.filter.Filter.decode(InputStream encoded, OutputStream 
decoded, COSDictionary parameters, Int32 index, DecodeOptions options)

at org.apache.pdfbox.cos.COSInputStream.create(List , COSDictionary , 
InputStream , ScratchFile , DecodeOptions )

at org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions options)

at org.apache.pdfbox.cos.COSStream.createInputStream()

at org.apache.pdfbox.pdmodel.PDPage.getContents()

at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(PDContentStream 
contentStream)

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream
 )

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream )

at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage page)

at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(PDPage )

at org.apache.pdfbox.text.PDFTextStripper.processPage(PDPage page)

at org.apache.pdfbox.text.PDFTextStripper.processPages(PDPageTree pages)

at org.apache.pdfbox.text.PDFTextStripper.writeText(PDDocument doc, Writer 
outputStream)

at org.apache.pdfbox.text.PDFTextStripper.getText(PDDocument doc)

 

We could mange to get the text extraction after adding these two .dlls in 
folder where PdfBox dll was residing.

IKVM.OpenJDK.Media.dll 
IKVM.AWT.WinForms.dll

 

Later we searched about the dependancies and we reached to this site. 
[http://www.squarepdf.net/pdfbox-in-net]

also attaching a zip of it.

 

We found lot of other dlls which we are not considering currently.

Thus I was wondering do we need all of these dlls or some specific. 

And also if possible, can we have a brief information about how different dlls 
are being used (what kind of problems can be there if not used them)

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to