On 10/05/2016 10:24 AM, Michael Meeks wrote:
> Hi Larry,
>
>    First - really great to have you looking at that
>       code ! =)

Thanks for the encouragement Michael.

>
> On 10/05/2016 04:10 PM, Larry Evans wrote:
>> I'm trying to understand how the pdf import code works.
>> I've tried looking at the code; however, that's hard to
>> follow; hence, I was hoping there was some sort of design
>> document explaining somewhat how the code works.
>
>    Second - the design list is really for User Experience / developer
> interaction, and this seems like a real gnarly coding problem - so I've
> re-sent it to the dev-list =)

OOPS.  Sorry about that.

>
>> TIA for any pointers.
>
>    Sure - so the PDF import is a bit of a mess; it currently spawns a
> remote process using poplar to parse the PDF, and then extracts (via a
> simple text protocol) data from poplar's rendering to re-constitute into
> internal ODF callbacks to produce an internal document; at least -
> that's if I got it right =)

Well, I did see code here:

  sdext/source/pdfimport/pdfparse/pdfparse.cxx

but that looked like it used boost/spirit to parse the pdf file
(about line 553):

            boost::spirit::parse( pBuffer,
                                  pBuffer+nLen,
                                  aGrammar,
                                  boost::spirit::space_p );

but then, trying to find where that (or the caller of that) was called
lead me to:

  sdext/source/pdfimport/wrapper/wrapper.cxx

where there is a call(around line 927):

  std::unique_ptr<pdfparse::PDFEntry> pEntry(
  pdfparse::PDFReader::read( aPDFFile.getStr() ));

but that's called in a function:


 bool checkEncryption

whose name doesn't suggest any translation into something
like the xml which is what libreoffice stores its files as,
IIUC:

  https://en.wikipedia.org/wiki/OpenOffice.org_XML

but, looking further in that file, there's, as you mention,
what looks like a remote process call in function:

  bool xpdf_ImportFromFile

on about line 1079:

        osl_executeProcess_WithRedirectedIO(converterURL.pData,
                                            args,
                                            nArgs,

osl_Process_SEARCHPATH|osl_Process_HIDDEN,
                                            pSecurity,
                                            nullptr, nullptr, 0,
                                            &aProcess, &pIn, &pOut, &pErr);

So that's where I wanted some overall design help, because I
thought it odd that boost::spirit was used to parse the
file, I guess, just to determine whether it was encrypted,
and then, an xpdf process was used to parse the same file
again. That seemed awfully redundant.


>
>    Poplar/xpdf has a GPL license and so requires all this silliness.
>

Hence, I guess Poplar/xpdf does some sophisticated
processing that the use of boost::spirit does not do or is
incapable of doing.  Of course, I'm jumping to conclusions
which hopefully people of the devel list will correct :)

>    In general - it would be -way- better to pick up something like eg.
> pdfium - and add a rendering front-end there to match first, the same
> protocol (but we can do this in-process), and subsquently to simplify
> and factor lots of that madness out =) PDFium seems to be gaining
> traction in browsers (Chrome + Firefox) and so on.

Thanks for the pointer.  I'm googling for PDFium now.

>
>    Does that make sense ? out of interest, what bug or mis-feature are you
> interested in there ? are you looking at:
>
>    filter/source/pdf
> and        sdext/source/pdfimport

The latter.

>
>    ? =)

I'm trying to solve the problem I posed earlier in this
post:


https://lists.freedesktop.org/archives/libreoffice/2014-January/059106.html

I've also noticed that the font sizes and location of
letters is sometime not correct; hence, I'd like to figure
out how to correct that.

Thanks for your interest, Michael.

-regards,
Larry



_______________________________________________
LibreOffice mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to