Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
. On Mon, May 24, 2010 at 7:13 PM, Eknath Venkataramani eknath.i...@gmail.com wrote: I have around 45 pdfs to convert into raw text containing text in _HINDI_ . When I use the xpdf package, the generated text is very weird, so I'd like to write a program which would convert the pdf text

Re: [BangPypers] extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
by Gora- 'you might not have those fonts installed in your system' -- Eknath Venkataramani ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] Coaching institute in Bangalore.

2010-04-16 Thread Eknath Venkataramani
/bangpypers -- Eknath Venkataramani +91-9844952442 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers

[BangPypers] pyparsing wrong output

2010-02-12 Thread Eknath Venkataramani
I am trying to write a parser in pyparsing. Help Me. http://paste.pocoo.org/show/177078/ is the code and this is input file: http://paste.pocoo.org/show/177076/ . I get output as: generator object at 0xb723b80c * * -- Eknath Venkataramani +91-9844952442

Re: [BangPypers] UTF-8 character

2010-01-31 Thread Eknath Venkataramani
On Sun, Jan 31, 2010 at 8:02 AM, Senthil Kumaran orsent...@gmail.comwrote: Yup. That is perfect. That emacs-style line declares to the interpreter that the following python script uses UTF-8 encoding. You might choose to use other encodings similarly too. Yeah. Thanks.

Re: [BangPypers] UTF-8 character

2010-01-29 Thread Eknath Venkataramani
On Fri, Jan 29, 2010 at 11:29 PM, Eknath Venkataramani eknath.i...@gmail.com wrote: I am trying to write a program to generate a file that simply removes all the punctuation marks from the input file. for the usual ascii characters like .,'?! it works. but then when I try to do the same

[BangPypers] How should I do it?

2010-01-14 Thread Eknath Venkataramani
I have a txt file in the following format: [code] confident = { count = 4, trans = { ashahvasahta = 0.74918568, atahmavaishahvaasa = 0.09095465, pahraaram\.nbha = 0.06990729, mailatae = 0.02856427, utanai = 0.01929341, anaa = 0.01578552,