I have integrated OCR into a handful of applications. There are two general approaches: either put the OCR into a workflow leading into MarkLogic, or call a REST-ish OCR service from within MarkLogic.
The first approach works well for publishing-oriented applications. All the content comes into MarkLogic through a pipeline, perhaps in Java or .NET, and the OCR technology is accessible from that pipeline. Look for OCR products that offer an API or SDK. Many are based on C or other languages in the C family, but building a JNI or other bridge can be worthwhile if the technology is a good fit for your application. The second approach is better when users will upload documents for OCR in a less controlled way: through webDAV for example. A CPF pipeline can take any new or updated documents and evaluate XQuery that calls out to a REST-ish OCR service. The last time I needed OCR and used this approach, I had to create that REST-ish interface myself because the OCR technology did not provide one. However https://www.google.com/search?q=web|rest+ocr turns up a few possibilities that might work out. I couldn't recommend a particular OCR technology without in-depth knowledge of the problems you are trying to solve. Your choice will depend on which of the above two approaches you prefer, plus your budget, the nature of the content, accuracy, and other considerations. But http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software might get you started. -- Mike On 3 Mar 2013, at 17:34 , Abhishek53 S <[email protected]> wrote: > Hi All, > > As per my understanding there are multiple OCR correction algorithm/tool > present in street. Is there someting build on top of Marklogic - Any idea > will be highly appreciated. > > Do we have any preferred algorithm to solve OCR correction? > > Thanks > > -Abhishek > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
