[ 
https://issues.apache.org/jira/browse/DRILL-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462066#comment-17462066
 ] 

ASF GitHub Bot commented on DRILL-8028:
---------------------------------------

cgivre commented on pull request #2359:
URL: https://github.com/apache/drill/pull/2359#issuecomment-997322720


   @paul-rogers 
   Thanks for all your review.  I addressed all your comments (I think) and did 
the following:
   * Added additional unit tests
   * Refactored table list so that all tables are not read into memory if not 
requested
   * Added iterator classes to avoid counters in the batch reader
   * Moved metadata collection to separate class
   * Refactored to allow a pdf with no tables to return metadata if requested
   * Added config option for different extraction algorithms.
   * General code cleanup
   
   I removed all but one of the `System.env` calls and I'm a little stuck on 
this.  The reason I added this line is that when querying a PDF with Drill in 
embedded mode, it opens an additional java window.  This does not occur when 
running unit tests which makes for difficult debugging.   I'm going to keep 
digging into this, but I was wondering if you could take a look at the rest of 
the revisions in the mean time?   The issue seems to be in either Tabula or 
PdfBox, which are the underlying libraries that read the PDF file. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add PDF Format Plugin
> ---------------------
>
>                 Key: DRILL-8028
>                 URL: https://issues.apache.org/jira/browse/DRILL-8028
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.19.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.0
>
>
> See PR for documentation.  This PR adds the ability to read tables contained 
> in PDF files. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to