Re: [DISCUSS] GSoC Participation

Fred Hansen Wed, 29 Jan 2014 17:29:29 -0800

IMHO a task for GSoC should be non-critical, localized, and not a user 
interface. A "non-critical" is one where PDFBOX development can continue 
without relying on the project result. A "localized" project is one that can be 
incorporated into the code base with few changes to the base. This will limit 
the effort required to learn about the system into which the effort will fit. A 
"user-interface" implements an interactive window or an API. I have low 
expectations of the capabilities of students for doing good designs in these 
areas.


So I looked through JIRA for open projects meeting the above.  Since I am not 
all that familiar with PDFBOX, some of my suggestions may be laughable and 
surely I have missed some. Nonetheless, here's what I found:


PDFBOX-553 writing pdf file in Japanese, garbled 
PDFBOX-570 Windings font recognition + spacing issue 
PDFBOX-605 Better support for Type0 fonts 
PDFBOX-678  Support missing Text Rendering Modes when rendering a PDF
PDFBOX-870 PDF-To-IMAGE output is not anti-aliased 
PDFBOX-1094 Pattern colorspace support 
PDFBOX-1594 Add support for AES256 Encryption 
        (see also PDFBOX-1450 document how to encrypt with AES 256 )
PDFBOX-1734 ImageIoUtil.WriteImage doesn't work with tiff images
PDFBOX-1843 Find a way to test PDFToImage 




>________________________________
> From: John Hewson <[email protected]>
>To: "[email protected]" <[email protected]> 
>Sent: Wednesday, January 29, 2014 6:38 PM
>Subject: Re: [DISCUSS] GSoC Participation
> 
>
>> - an idea which came up some years ago, was to implement a gui-interface to
>> bundle some/all/future tools/features of pdfbox, like printing, rendering,
>> preflight, split, merge etc.
>
>The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? 
>Maybe support for JavaFX?
>
>> - a high-level api to create pdfs
>
>I've been thinking about this recently and have come to the conclusion that 
>it's really hard to do well.
>
>> - an advanced text extractor with table/column support
>
>The table stuff sounds a lot like Tabula? Do we really not have column 
>support? We need that!
>
>I'll throw in some ideas too:
>
>- an interface for OCR engines to plug into the text extraction API. It could 
>provide access to extracted images or allow badly encoded fonts to be passed 
>to OCR one character or text run at a time.
>
>- 
>
>-- John
>
>
>> On 29 Jan 2014, at 03:20, Andreas Lehmkühler <[email protected]> wrote:
>> 
>> Hi,
>> 
>>> Maruan Sahyoun <[email protected]> hat am 29. Januar 2014 um 10:44
>>> geschrieben:
>>> 
>>> 
>>> Hi
>>> 
>>> shall we try to participate at GSoC? Needs a mentor though.
>> That idea already came up from time to time and it didn't work for different
>> reasons.
>> 
>> So, to participate we need a mentor and or course at least one good idea to 
>> pe
>> proposed.
>> 
>> I won't act as mentor for different reasons but I'll try to help in the 
>> normal
>> manner.
>> 
>> IMO an appropriate idea shall not deal with pdf-specific low-level features,
>> like linearization support, as I doubt that any possible student is familiar
>> with the pdf-spec.
>> 
>> So possible ideas could be:
>> 
>> - an idea which came up some years ago, was to implement a gui-interface to
>> bundle some/all/future tools/features of pdfbox, like printing, rendering,
>> preflight, split, merge etc.
>> - a high-level api to create pdfs
>> - an advanced text extractor with table/column support
>> 
>> 
>>> BR
>>> 
>>> Maruan Sahyoun
>> 
>> BR
>> Andreas Lehmkühler
>
>

Re: [DISCUSS] GSoC Participation

Reply via email to