With Generics, things might look like this instead perhaps ...
private Comparator<TextPosition> textPositionComparator= new
TextPositionComparator();
public Comparator<TextPosition> getTextPositionComparator() {
return textPositionComparator;
}
public void setgetTextPositionComparator(Comparator<TextPosition>
comparator) {
textPositionComparator = comparator;
}
On Tue, Nov 8, 2011 at 10:50 AM, Raimi Rufai <[email protected]> wrote:
> Hi Sebastien,
>
> It might be more flexible to inject an instance of rather than the class
> of the Comparator. For comparators that take parameters, your current
> solution won't work. In other words, you would have:
>
> private Comparator<TextPosition> textPositionComparator= new
> TextPositionComparator();
>
> public Comparator<TextPosition> getTextPositionComparator() {
> return textPositionComparator;
> }
>
> public void setgetTextPositionComparator(Comparator<TextPosition>
> comparator) {
> textPositionComparator = comparator;
> }
>
> What do you think?
>
> Regards,
>
> Raimi
>
>
>
> On Tue, Nov 8, 2011 at 10:24 AM, Martinez, Mel - 1004 - MITLL <
> [email protected]> wrote:
>
>> Sebastien,
>>
>> I totally agree that this would be a good change, having run into the same
>> problem when working out my own mods to the text extraction some time ago.
>>
>> Please create a JIRA issue proposing this at:
>> https://issues.apache.org/jira/browse/PDFBOX
>>
>> Mel
>>
>>
>> -----Original Message-----
>> From: Sébastien Dailly [mailto:[email protected]]
>> Sent: Tuesday, November 08, 2011 4:27 AM
>> To: [email protected]
>> Subject: PDFTextStripper : can't change the default TextPositionComparator
>>
>> Hello,
>>
>> I'm trying to use the PDFTextStripper class, but the sortByPosition does
>> not seems to act correctly when the chararacters on the same line are
>> not exactly on the same y position.
>>
>> There is no way to replace the TextPositionComparator used in the class
>> by my own, even by subclassing the PDFTextStripper class ( see later ).
>>
>> One solution is to use a getter instead of a hard link between classes :
>>
>> > List<TextPosition> textList = charactersByArticle.get( i );
>> > if( getSortByPosition() )
>> > {
>> > TextPositionComparator comparator = new
>> TextPositionComparator();
>> > Collections.sort( textList, comparator );
>> > }
>>
>> become :
>>
>> > List<TextPosition> textList = charactersByArticle.get( i );
>> > if( getSortByPosition() )
>> > {
>> > Comparator comparator = getTextPositionComparator();
>> > Collections.sort( textList, comparator );
>> > }
>>
>> with getTextPositionComparator defined as following :
>>
>> > private Class<? extends Comparator> textPositionComparator=
>> TextPositionComparator.class;
>>
>> > […]
>>
>> > /**
>> > *
>> > * @return The comparator for ordening text position.
>> > */
>> > public Comparator getTextPositionComparator() {
>> > try {
>> > return textPositionComparator.newInstance();
>> > } catch (final InstantiationException e) {
>> > return null;
>> > } catch (final IllegalAccessException e) {
>> > return null;
>> > }
>> > }
>>
>> (with the appropriate setter).
>>
>> Note :
>>
>> Still the PDFTextStripper.writePage is protected, it uses the
>> getTextPosition method from the PositionWrapper class, wich is a
>> protected method, without subclassing this class ! This only works
>> because they belong to the same package ! (I think it can be considered
>> as a bug in the project architecture)
>>
>> > //Resets the average character width when we see a change
>> in font
>> > // or a change in the font size
>> > if(lastPosition != null && ((position.getFont() !=
>> lastPosition.getTextPosition().getFont())
>> > || (position.getFontSize() !=
>> lastPosition.getTextPosition().getFontSize())))
>> > {
>> > previousAveCharWidth = -1;
>> > }
>>
>> Thank you,
>>
>> --
>> Sébastien
>>
>
>
>
> --
> «To develop software is to build a machine simply by describing it.»
> (Michael A. Jackson -- not the singer)
>
> «Développer un logiciel revient à construire une machine tout simplement
> en le décrivant.» (Michael A. Jackson - pas le chanteur)
>
>
--
«To develop software is to build a machine simply by describing it.»
(Michael A. Jackson -- not the singer)
«Développer un logiciel revient à construire une machine tout simplement en
le décrivant.» (Michael A. Jackson - pas le chanteur)