Anand, Thanks for the padma tip! Never found that in web searches. I see a lot of character maps. I guess the way would be to run the text through all of them, find the closest match and the fix the encodings that are off.
best, Rushabh W: https://erpnext.com T: @rushabh_mehta On 05-Jul-2013, at 11:32 AM, Anand Chitipothu <[email protected]> wrote: > On Fri, Jul 5, 2013 at 11:20 AM, Rushabh Mehta <[email protected]> wrote: > Hello all, > > I am not sure if this is the right forum, but would love to get any pointers. > > I am volunteering with a local Hindi newspaper and want to get their editions > online in web searchable format. Here is the link to the site. > > http://aainanews.blogspot.in/2012/08/14th-issue-3-year.html > > The biggest hurdle I am facing is to convert the fonts the paper is encoded > in (APS-Priyanka) and converting them to unicode (assuming that I can extract > the text from the pdfs and keeping the formatting issues on the side for the > moment) > > From what I gathered from web searches, APS Priyanka is a really old font and > does not follow any specific encoding like ISCII etc. I tried some basic > scripts and character maps but it does not seem like a "trivial" problem. > > If anyone has experience in this and can help, it would be great. > > Hi Rushbh, > > Looks like you want to convert text encoded using custom encoding used by > proprietary fonts to unicode, not making a legacy font font to be unicode > friendly. > > I'm not an expert in that area, but I can give you some pointers. > > There used to be a website uni.medhas.org which used to convert websites > using windows specific fonts to unicode on the fly. Looks like that website > is no more and here is copy of it from the wayback machine. > > http://web.archive.org/web/20080325204643/http://uni.medhas.org/ > > The same guys created firefox extension to do the same translation. > > http://padma.mozdev.org/ > > Look at the code or talk to those guys about how to convert fonts. > > Anand > http://anandology.com/ > > -- > For more details about this list > http://datameet.org/discussions/ > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
