Hi Ranganath,

You're in luck. I'll explain why at the end.

First, in your PDF if you navigate around the menu to find something like
the document properties, then you will find the list of fonts somewhere.
I see "Nudi Akshar-01" and "Nudi Akshar-06" fonts listed other than the
regular ones. So I started searching using this string:

*kannada font "nudi akshar" to unicode convert*

Sharing a few potential leads for conversion:
- http://aravindavk.in/projects/ : It seems he has created a converter for
another Kannada font and released it on github. One can take it and change
the mappings to make it work for Nudi Akshar. He has shared his email id on
one of the pages.

-
https://meta.wikimedia.org/wiki/Wikimedia_Blog/Drafts/Converting_from_non_Unicode_(Nudi,_Baraha,_...)_font_encoding_to_Unicode_Kannada
:
I found this page mentioned in this search result which managed to come
around the top of my search: https://bitbin.it/KV0Mn1x1/ ... talk about
digital breadcrumbs. I advise you make an entry on the Talk page here to
get in touch with others like yourself.

- The wikimedia page leads to this :
https://www.karnataka.gov.in/kcit/pages/kannadasoftware.aspx

I'm not exploring further.. please check it out at your end.

If you're more interested in just having that content read than converted
to Unicode and you have some control on the places where it'll be read,
then you can find and install the fonts mentioned, and share their .TTF
files for installing elsewhere. However, this will not be possible on
phones and tablets (as far as I know).

----------

For folks having a similar issue in Devnagri fonts (Hindi, Marathi etc),
check out this :
https://sites.google.com/site/technicalhindi/home/converters.
Brilliant work, but I wish someone would help them move to github. I had to
customize one of their converters as the text I was dealing with had
slightly differing mappings. It was a fun reverse engineering exercise.
I've shared my customized converters here: http://ourpuneourbudget.in/tools/

----------

*Why you're in luck*

Non-English Unicode texts and PDF technology have a weird problem that
hasn't been resolved yet. PDF has to re-arrange the character glyphs to
make them appear properly. It messes the text up. Display is achieved but
Fidelity is lost. So, Unicode text that goes into a PDF... may or may not
make fully it back out in one piece. The degree of distortion even seems to
vary across softwares and operating systems.

An intervention at the PDF creating end (hence not applicable to our case)
is shared here:
https://bugs.documentfoundation.org/show_bug.cgi?id=66597 (find Xetex)

Legacy ANSII fonts on the other hand.. retain full fidelity. You can
convert a legacy fonts doc (like yours) to PDF, copy out the text and
retain the original.

So, since your text is in a legacy font (Nudi Akshar), you stand a chance
of converting the whole thing into Unicode Kannada at the click of a button.
Had it been in Unicode Kannada, you may have to manually proof-read
everything and make necessary edits.

------------

For those trying to get Unicode text out of PDFs : Hope you find a way, all
the best. Check out that documentfoundation link above. See past
discussions on this group:
https://groups.google.com/forum/#!searchin/datameet/pdf$20unicode%7Csort:date



--
Cheers,
Nikhil VJ
+91-966-583-1250
Pune, India
Website <http://nikhilvj.cu.cc>
DataMeet Pune chapter <https://datameet-pune.github.io/>
Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
Contribute <https://www.instamojo.com/@nikhilvj/>

On Fri, Mar 2, 2018 at 10:19 AM, <rangan...@onlinerti.com> wrote:

> I have a kannada PDF file am trying to extract the data from the PDF. But
> it seems the font used is not in Unicode. I tried copy pasting the text
> from PDF still the character display properly.
>
> I have attached the PDF also.
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to