Anand,

Thanks for the padma tip! Never found that in web searches. I see a lot of 
character maps. I guess the way would be to run the text through all of them, 
find the closest match and the fix the encodings that are off.

best,
Rushabh


W: https://erpnext.com
T: @rushabh_mehta

On 05-Jul-2013, at 11:32 AM, Anand Chitipothu <[email protected]> wrote:

> On Fri, Jul 5, 2013 at 11:20 AM, Rushabh Mehta <[email protected]> wrote:
> Hello all,
> 
> I am not sure if this is the right forum, but would love to get any pointers.
> 
> I am volunteering with a local Hindi newspaper and want to get their editions 
> online in web searchable format. Here is the link to the site.
> 
> http://aainanews.blogspot.in/2012/08/14th-issue-3-year.html
> 
> The biggest hurdle I am facing is to convert the fonts the paper is encoded 
> in (APS-Priyanka) and converting them to unicode (assuming that I can extract 
> the text from the pdfs and keeping the formatting issues on the side for the 
> moment)
> 
> From what I gathered from web searches, APS Priyanka is a really old font and 
> does not follow any specific encoding like ISCII etc. I tried some basic 
> scripts and character maps but it does not seem like a "trivial" problem.
> 
> If anyone has experience in this and can help, it would be great.
> 
> Hi Rushbh,
> 
> Looks like you want to convert text encoded using custom encoding used by 
> proprietary fonts to unicode, not making a legacy font font to be unicode 
> friendly.
> 
> I'm not an expert in that area, but I can give you some pointers.
> 
> There used to be a website uni.medhas.org which used to convert websites 
> using windows specific fonts to unicode on the fly. Looks like that website 
> is no more and here is copy of it from the wayback machine.
> 
> http://web.archive.org/web/20080325204643/http://uni.medhas.org/
> 
> The same guys created firefox extension to do the same translation.
> 
> http://padma.mozdev.org/
> 
> Look at the code or talk to those guys about how to convert fonts.
> 
> Anand
> http://anandology.com/
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> --- 
> You received this message because you are subscribed to the Google Groups 
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to