Hi, There is another code coverage maven plug-in, called cobertura. If you run *mvn clean install cobertura:cobertura* no need to put it in the pom.
Hope it helps. On Sat, Feb 8, 2014 at 10:17 PM, Grant Ingersoll (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895718#comment-13895718] > > Grant Ingersoll commented on TIKA-93: > ------------------------------------- > > bq. what is the dependency on jacoco in tika-parent? That stuff seems > orthogonal to the patch. > > I put that in so that I can measure whether I am testing sufficiently. I > can separate it out to a different patch. > > bq. dependency on custom external Maven repo – myGrid – any way to get the > jar from the Central repo somewhere? we have made an effort in Tika to > remove any specific deps on external repositories > > We could make that one optional. All it does is add support for TIFF and > a few other file formats that aren't part of the standard ImageIO. > > bq. in my CS572 class on Search Engines where we look at FBI Vault PDF > files! http://www-scf.usc.edu/~csci572/ > > I read your abstract for your talk and checked out the Vault and thought > it would be cool, too. The main issue is that JavaOCR needs to be trained > in order to work with that data set. Tesseract, on the other hand, works > for it, but alas, needs to be implemented as an OCRParser. Since Tess4J > has some bad deps, the only way I could see to do this is to exec the > process or go write my own JNI integration for Tesseract. The latter isn't > likely to happen. The former feels less than desirable, but would work. > > > OCR support > > ----------- > > > > Key: TIKA-93 > > URL: https://issues.apache.org/jira/browse/TIKA-93 > > Project: Tika > > Issue Type: New Feature > > Components: parser > > Reporter: Jukka Zitting > > Assignee: Chris A. Mattmann > > Priority: Minor > > Attachments: TIKA-93.patch, TIKA-93.patch, TIKA-93.patch > > > > > > I don't know of any decent open source pure Java OCR libraries, but > there are command line OCR tools like Tesseract ( > http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to > extract text content (where available) from image files. > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) >