On Fri, Jul 5, 2013 at 11:20 AM, Rushabh Mehta <[email protected]> wrote:

> Hello all,
>
> I am not sure if this is the right forum, but would love to get any
> pointers.
>
> I am volunteering with a local Hindi newspaper and want to get their
> editions online in web searchable format. Here is the link to the site.
>
> http://aainanews.blogspot.in/2012/08/14th-issue-3-year.html
>
> The biggest hurdle I am facing is to convert the fonts the paper is
> encoded in (APS-Priyanka) and converting them to unicode (assuming that I
> can extract the text from the pdfs and keeping the formatting issues on the
> side for the moment)
>
> From what I gathered from web searches, APS Priyanka is a really old font
> and does not follow any specific encoding like ISCII etc. I tried some
> basic scripts and character maps but it does not seem like a "trivial"
> problem.
>
> If anyone has experience in this and can help, it would be great.
>

Hi Rushbh,

Looks like you want to convert text encoded using custom encoding used
by proprietary fonts to unicode, not making a legacy font font to be
unicode friendly.

I'm not an expert in that area, but I can give you some pointers.

There used to be a website uni.medhas.org which used to convert websites
using windows specific fonts to unicode on the fly. Looks like that website
is no more and here is copy of it from the wayback machine.

http://web.archive.org/web/20080325204643/http://uni.medhas.org/

The same guys created firefox extension to do the same translation.

http://padma.mozdev.org/

Look at the code or talk to those guys about how to convert fonts.

Anand
http://anandology.com/

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to