Hello,

I'm trying to use the PDFTextStripper class, but the sortByPosition does not seems to act correctly when the chararacters on the same line are not exactly on the same y position.

There is no way to replace the TextPositionComparator used in the class by my own, even by subclassing the PDFTextStripper class ( see later ).

One solution is to use a getter instead of a hard link between classes :

            List<TextPosition> textList = charactersByArticle.get( i );
            if( getSortByPosition() )
            {
                TextPositionComparator comparator = new 
TextPositionComparator();
                Collections.sort( textList, comparator );
            }

become :

            List<TextPosition> textList = charactersByArticle.get( i );
            if( getSortByPosition() )
            {
                Comparator comparator = getTextPositionComparator();
                Collections.sort( textList, comparator );
            }

with getTextPositionComparator defined as following :

private Class<? extends Comparator> textPositionComparator= 
TextPositionComparator.class;

[…]

        /**
         *
         * @return The comparator for ordening text position.
         */
        public Comparator getTextPositionComparator() {
                try {
                        return textPositionComparator.newInstance();
                } catch (final InstantiationException e) {
                        return null;
                } catch (final IllegalAccessException e) {
                        return null;
                }
        }

(with the appropriate setter).

Note :

Still the PDFTextStripper.writePage is protected, it uses the getTextPosition method from the PositionWrapper class, wich is a protected method, without subclassing this class ! This only works because they belong to the same package ! (I think it can be considered as a bug in the project architecture)

               //Resets the average character width when we see a change in font
                // or a change in the font size
                if(lastPosition != null && ((position.getFont() != 
lastPosition.getTextPosition().getFont())
                        || (position.getFontSize() != 
lastPosition.getTextPosition().getFontSize())))
                {
                    previousAveCharWidth = -1;
                }

Thank you,

--
Sébastien

Reply via email to