[
https://issues.apache.org/jira/browse/NIFI-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704587#comment-15704587
]
Sergio Fernández commented on NIFI-1815:
----------------------------------------
Hi guys, so long since your incubation...
I just saw [PR #397|https://github.com/apache/nifi/pull/397], and I'd like to
ask if you have properly checked the license. Because tess4j (AL2) depends on
ghost4j (GPL3), so my understanding it goes into [License Category
X|https://www.apache.org/legal/resolved.html#category-x], which may not be
included within Apache products.
So I guess it needs further investigation. Feedback is welcomed!
> Tesseract OCR Processor
> -----------------------
>
> Key: NIFI-1815
> URL: https://issues.apache.org/jira/browse/NIFI-1815
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Jeremy Dyer
> Assignee: Jeremy Dyer
> Attachments: 0006-changes-to-the-OCR-processor.patch,
> nifi_1815_1.x_patch.zip
>
>
> This ticket is a follow-up to NIFI-1718 minus the use of the Tika library
> Expose OCR capabilities through a new processor which uses the Tesseract
> library. Use of this processor would require that Tesseract be installed on
> the NiFi host. Since the processor will have a system dependency care must be
> taken to ensure that the overall NiFi cluster continues to function properly
> in the absence of the Tesseract system dependency even though the OCR
> processor itself will be unable to perform its duties. In the event that the
> system dependencies are not detected the processor should display a
> validation warning rather than failing or preventing the NiFi instance from
> booting properly.
> Properties expose to configure Tesseract
> tesseractPath - Path to tesseract installation folder, if not on system path.
> language - Language ID (e.g. "eng"); language dictionary to be used.
> pageSegMode - Tesseract page segmentation mode, defaults to 1.
> minFileSizeToOcr - Minimum file size to submit file to OCR, defaults to 0.
> maxFileSizeToOcr - Maximum file size to submit file to OCR, defaults to
> Integer.MAX_VALUE.
> timeout - Maximum time (in seconds) to wait for the OCR process termination;
> defaults to 120.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)