I have integrated OCR into a handful of applications. There are two general 
approaches: either put the OCR into a workflow leading into MarkLogic, or call 
a REST-ish OCR service from within MarkLogic.

The first approach works well for publishing-oriented applications. All the 
content comes into MarkLogic through a pipeline, perhaps in Java or .NET, and 
the OCR technology is accessible from that pipeline. Look for OCR products that 
offer an API or SDK. Many are based on C or other languages in the C family, 
but building a JNI or other bridge can be worthwhile if the technology is a 
good fit for your application.

The second approach is better when users will upload documents for OCR in a 
less controlled way: through webDAV for example. A CPF pipeline can take any 
new or updated documents and evaluate XQuery that calls out to a REST-ish OCR 
service. The last time I needed OCR and used this approach, I had to create 
that REST-ish interface myself because the OCR technology did not provide one. 
However https://www.google.com/search?q=web|rest+ocr turns up a few 
possibilities that might work out.

I couldn't recommend a particular OCR technology without in-depth knowledge of 
the problems you are trying to solve. Your choice will depend on which of the 
above two approaches you prefer, plus your budget, the nature of the content, 
accuracy, and other considerations. But 
http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software 
might get you started.

-- Mike

On 3 Mar 2013, at 17:34 , Abhishek53 S <[email protected]> wrote:

> Hi All,
> 
> As per my understanding there are multiple OCR correction algorithm/tool 
> present in street. Is there someting build on top of Marklogic - Any idea 
> will be highly appreciated.
> 
> Do we have any preferred algorithm to solve OCR correction?
> 
> Thanks
> 
> -Abhishek
> 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to