Re: Improving OCR plugin for PDFBox

Santosh Arakeri Mon, 07 Jul 2014 10:40:33 -0700

Pl dont send me mail.


On Fri, Jun 27, 2014 at 12:28 PM, John Hewson <[email protected]> wrote:

> Hi Dimuthu
>
> That’s great. We should wait until closer to the end of the GSoC period to
> integrate your work with PDFBox, as ideally we only want to have to do it
> once. We’ve not included C++ dependencies before so no, there won’t be a
> standard way, we’ll have to think something up. We’ll either make it an
> optional sub-project and the Tesseract JNI bindings might be better of
> having their own branch so that they are more like an external dependency -
> I’ll ask the dev mailing list.
>
> To prepare your code for contribution you’ll need to add the Apache header
> to each.java file (see any PDFBox .java file for an example) and submit a
> signed ICLA http://www.apache.org/licenses/icla.pdf to Apache.
>
> Regarding additional functionality, the most useful would be for a new
> command line tool which could write the OCR’d text back into the original
> PDF file as “invisible text”, which would allow for copy and paste and text
> search to then work for that PDF file. A starting point for this would be
> to try and write the OCR’d text into the original PDF as “visible” text -
> we can make it invisible later!
>
> -- John
>
> On 19 Jun 2014, at 13:57, DImuthu Upeksha <[email protected]>
> wrote:
>
> > Hi John,
> > Except providing compatibility for platforms like windows, I think most
> of the functionalities of OCR plugin are finished (Please correct me if I'm
> wrong). But I would like to contribute to project further. Do  you have
> anything to add as a new functionality? And If you plan to add this to
> PDFBox code, how should prepare my code? Is there any standard way?
> >
> > Thanks
> > Dimuthu
> > --
> > Regards
> > W.Dimuthu Upeksha
> > Undergraduate
> > Department of Computer Science And Engineering
> > University of Moratuwa, Sri Lanka
>
>

Re: Improving OCR plugin for PDFBox

Reply via email to