[
https://issues.apache.org/jira/browse/PDFBOX-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080577#comment-15080577
]
Praveer commented on PDFBOX-3176:
---------------------------------
thanks man!
> Add a removeRegion method in PDFTextSTripperByArea class
> --------------------------------------------------------
>
> Key: PDFBOX-3176
> URL: https://issues.apache.org/jira/browse/PDFBOX-3176
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.8.10, 1.8.11, 2.0.0
> Environment: All
> Reporter: Praveer
> Assignee: Tilman Hausherr
> Fix For: 1.8.11, 2.0.0
>
>
> Hi,
> I am parsing a very complicated PDF, for which I had to enable
> (setSortByPosition as true), otherwise the Parser is not able to do
> sequential text extraction.
> So I decided to use PDFTextStripperByArea class, and then make rectangles to
> extract text. But problem here is that If I make many rectangles in a single
> page, again there is no logical sequence of text extracted, So to get around
> this it will be awesome to have a method to remove regions, then we can add a
> region extract text, remove that region , then again add new region and so
> on....
> I have already done a POC in my local computer and it works fine. added this
> method and tested.
> public void removeRegion(String regionName) {
> this.regions.remove(regionName);
> this.regionArea.remove(regionName);
> }
> I can contribute this code myself, if you suggest, let me know, thanks and
> regards
> Praveer
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]