[ 
https://issues.apache.org/jira/browse/NIFI-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316439#comment-15316439
 ] 

ASF GitHub Bot commented on NIFI-1815:
--------------------------------------

Github user jdye64 commented on the issue:

    https://github.com/apache/nifi/pull/397
  
    Olegz - I'll certainly add the extra checking. As for installation on nom 
Windows I did the development on OS X and simply ran "brew install tesseract" 
rather than building from source
    
    Sent from my iPhone
    
    > On Jun 6, 2016, at 7:47 AM, Oleg Zhurakousky <[email protected]> 
wrote:
    > 
    > Ok, while all is good on Windows I can't seem to have any success 
building those .so files on OSx. Normally I would not worry about it that much 
but given that Tesseract distribution includes DLLs inside the JAR means that 
for "all other" OS such native libraries will come from outside and need to be 
known to the processor, so we probably would need another property and 
definitely test with at least one non-Win system.
    > So, I'll keep on trying (when I get a chance) to get/build those native 
libraries, but could use some help here as well
    > 
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub, or mute the thread.
    > 



> Tesseract OCR Processor
> -----------------------
>
>                 Key: NIFI-1815
>                 URL: https://issues.apache.org/jira/browse/NIFI-1815
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Jeremy Dyer
>            Assignee: Jeremy Dyer
>         Attachments: 0006-changes-to-the-OCR-processor.patch, 
> nifi_1815_1.x_patch.zip
>
>
> This ticket is a follow-up to NIFI-1718 minus the use of the Tika library
> Expose OCR capabilities through a new processor which uses the Tesseract 
> library. Use of this processor would require that Tesseract be installed on 
> the NiFi host. Since the processor will have a system dependency care must be 
> taken to ensure that the overall NiFi cluster continues to function properly 
> in the absence of the Tesseract system dependency even though the OCR 
> processor itself will be unable to perform its duties. In the event that the 
> system dependencies are not detected the processor should display a 
> validation warning rather than failing or preventing the NiFi instance from 
> booting properly.
> Properties expose to configure Tesseract
> tesseractPath - Path to tesseract installation folder, if not on system path.
> language - Language ID (e.g. "eng"); language dictionary to be used.
> pageSegMode - Tesseract page segmentation mode, defaults to 1.
> minFileSizeToOcr - Minimum file size to submit file to OCR, defaults to 0.
> maxFileSizeToOcr - Maximum file size to submit file to OCR, defaults to 
> Integer.MAX_VALUE.
> timeout - Maximum time (in seconds) to wait for the OCR process termination; 
> defaults to 120.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to