[ 
https://issues.apache.org/jira/browse/NIFI-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265747#comment-15265747
 ] 

ASF GitHub Bot commented on NIFI-1815:
--------------------------------------

GitHub user jdye64 opened a pull request:

    https://github.com/apache/nifi/pull/397

    NIFI-1815

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jdye64/nifi NIFI-1815

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit a9dce4d6233e5d0578c4c938a1dd868445b6a0fc
Author: Jeremy Dyer <[email protected]>
Date:   2016-04-27T15:57:58Z

    NIFI-1815
    
    First commit of NIFI-1815

commit 4a999ddaccaffe10bd1ba0ccbb542af7cf1a18c2
Author: Jeremy Dyer <[email protected]>
Date:   2016-05-01T12:17:50Z

    NIFI-1815
    
    Tesseract OCR processor for Apache NiFi

----


> Tesseract OCR Processor
> -----------------------
>
>                 Key: NIFI-1815
>                 URL: https://issues.apache.org/jira/browse/NIFI-1815
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Jeremy Dyer
>            Assignee: Jeremy Dyer
>
> This ticket is a follow-up to NIFI-1718 minus the use of the Tika library
> Expose OCR capabilities through a new processor which uses the Tesseract 
> library. Use of this processor would require that Tesseract be installed on 
> the NiFi host. Since the processor will have a system dependency care must be 
> taken to ensure that the overall NiFi cluster continues to function properly 
> in the absence of the Tesseract system dependency even though the OCR 
> processor itself will be unable to perform its duties. In the event that the 
> system dependencies are not detected the processor should display a 
> validation warning rather than failing or preventing the NiFi instance from 
> booting properly.
> Properties expose to configure Tesseract
> tesseractPath - Path to tesseract installation folder, if not on system path.
> language - Language ID (e.g. "eng"); language dictionary to be used.
> pageSegMode - Tesseract page segmentation mode, defaults to 1.
> minFileSizeToOcr - Minimum file size to submit file to OCR, defaults to 0.
> maxFileSizeToOcr - Maximum file size to submit file to OCR, defaults to 
> Integer.MAX_VALUE.
> timeout - Maximum time (in seconds) to wait for the OCR process termination; 
> defaults to 120.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to