Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim
On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene dhananjay.n...@gmail.comwrote:

 This seems to be an output of print_r of PHP. If you have a flexibility,
 try
 to have the PHP code output the data into a language neutral format (eg
 json, yaml, xml etc.) and then parse it in python using the appropriate
 parser. If not you may have to write a custom parser. I did google to find
 if one existed, but couldn't easily locate one.



There is
 http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards
has json part of the stdlib.

If you don't have access to the webserver, you might be able to use the php
interpreter on your own machine to parse this into something more language
neutral


-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Kenneth Gonsalves
On Friday 15 Jan 2010 12:01:56 pm Eknath Venkataramani wrote:
 and I need to extract confident , ashahvasahta from the first
 record, consumers,  upabhaokahtaa from the second record...
 i.e. word in english and the first word in the probable-translations
 

#!/usr/bin/python

words = [{'english':confident,
  'count' : 4,
  'trans' : [
 (ashahvasahta , 0.74918568),
(atahmavaishahvaasa , 0.09095465),
(pahraaram\.nbha , 0.06990729),
 (mailatae , 0.02856427),
   (utanai , 0.01929341),
 (anaa , 0.01578552),
 (uthaanae , 0.01403157),
 (jaitanae , 0.01227762),
],
},
{'english':consumers,
  'count' : 4,
  'trans' : [
(upabhaokahtaa , 0.75144362),
(upabhaokahtaaom\.n , 0.12980166),
]
},
{ 'english':a,
  'count' : 1164,
  'trans' : [
  (eka , 0.14900491),
   (kaisai , 0.08834675),
 (haai , 0.06774697),
 (kaoi , 0.05394308),
  (kai , 0.04981982),
 (\(none\) , 0.04400085),
  (kaa , 0.03726579),
  (kae , 0.03446450),
],
}
]

for word in words:
print word['english'],word['trans'][0][0]
-- 
regards
kg
http://lawgon.livejournal.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Roshan Mathews
On Fri, Jan 15, 2010 at 2:40 PM, Anand Balachandran Pillai
abpil...@gmail.com wrote:
    # Now, count and trans are not strings in
    # data, so Python will complain, hence we
    # define these as strings with same name!
    count, trans = 'count','trans'

Clever, that.  I got to there, threw up my hands and went downstairs
to eat lunch.

  -- rm
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Baishampayan Ghose
 It is a clever hack, taking advantage of the nature of the data. But
 it is far more faster than the other approaches posted here.

I thought eval was evil :)

Regards,
BG

-- 
Baishampayan Ghose
b.ghose at gmail.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim
On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose b.gh...@gmail.comwrote:

  It is a clever hack, taking advantage of the nature of the data. But
  it is far more faster than the other approaches posted here.

 I thought eval was evil :)


Given that the OPs data is fixed, eval is okay. :)

Otherwise, it could be evil or unreliable (eg. = inside some of the data
strings etc.)


-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Chitipothu
On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose b.gh...@gmail.com wrote:
 It is a clever hack, taking advantage of the nature of the data. But
 it is far more faster than the other approaches posted here.

 I thought eval was evil :)

The date looks like valid json. You can use simplejson.loads instead of eval.

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Anand Balachandran Pillai
On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu anandol...@gmail.comwrote:

 On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose b.gh...@gmail.com
 wrote:
  It is a clever hack, taking advantage of the nature of the data. But
  it is far more faster than the other approaches posted here.
 
  I thought eval was evil :)

 The date looks like valid json. You can use simplejson.loads instead of
 eval.

 Python 2.6.2 (r262:71600, Aug 21 2009, 12:23:57)
[GCC 4.4.1 20090818 (Red Hat 4.4.1-6)] on linux2
Type help, copyright, credits or license for more information.
  import simplejson
 data=open('data.txt').read().replace('[code]','').replace('[/code]','')
 data
'\nconfident = {\n count = 4,\n trans = {\nashahvasahta =
0.74918568,\n   atahmavaishahvaasa = 0.09095465,\n   pahraaram\\.nbha
= 0.06990729,\nmailatae = 0.02856427,\n  utanai =
0.01929341,\nanaa = 0.01578552,\nuthaanae =
0.01403157,\njaitanae = 0.01227762,\n   },\n},\nconsumers =
{\n count = 4,\n trans = {\n   upabhaokahtaa = 0.75144362,\n
upabhaokahtaaom\\.n = 0.12980166,\n
sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha = 0.11875471,\n
},\n},\na = {\n count = 1164,\n trans = {\n eka =
0.14900491,\n  kaisai = 0.08834675,\nhaai =
0.06774697,\nkaoi = 0.05394308,\n kai =
0.04981982,\n\\(none\\) = 0.04400085,\n kaa =
0.03726579,\n kae = 0.03446450,\n   },\n},\n\n'
 simplejson.loads(data)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python2.6/site-packages/simplejson/__init__.py, line
307, in loads
return _default_decoder.decode(s)
  File /usr/lib64/python2.6/site-packages/simplejson/decoder.py, line 338,
in decode
raise ValueError(errmsg(Extra data, s, end, len(s)))
ValueError: Extra data: line 2 column 13 - line 37 column 1 (char 13 - 815)


Anand
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Noufal Ibrahim
On Fri, Jan 15, 2010 at 4:13 PM, Anand Chitipothu anandol...@gmail.comwrote:

 On Fri, Jan 15, 2010 at 4:00 PM, Baishampayan Ghose b.gh...@gmail.com
 wrote:
  It is a clever hack, taking advantage of the nature of the data. But
  it is far more faster than the other approaches posted here.
 
  I thought eval was evil :)

 The date looks like valid json. You can use simplejson.loads instead of
 eval.


Don't the '=' characters mess things up?

One of the nice things about the repr of Python objects is that they're
almost valid JSON. The same can't be said for PHP though.


-- 
~noufal
http://nibrahim.net.in
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-15 Thread Dhananjay Nene
On Fri, Jan 15, 2010 at 12:01 PM, Eknath Venkataramani 
eknath.i...@gmail.com wrote:

 I have a txt file in the following format:
 [code]
 confident = {
  count = 4,
  trans = {
 ashahvasahta = 0.74918568,
atahmavaishahvaasa = 0.09095465,
pahraaram\.nbha = 0.06990729,
 mailatae = 0.02856427,
   utanai = 0.01929341,
 anaa = 0.01578552,
 uthaanae = 0.01403157,
 jaitanae = 0.01227762,
},
 },
 consumers = {
  count = 4,
  trans = {
upabhaokahtaa = 0.75144362,
upabhaokahtaaom\.n = 0.12980166,
sauda\�\�\�dha = 0.11875471,
},
 },
 a = {
  count = 1164,
  trans = {
  eka = 0.14900491,
   kaisai = 0.08834675,
 haai = 0.06774697,
 kaoi = 0.05394308,
  kai = 0.04981982,
 \(none\) = 0.04400085,
  kaa = 0.03726579,
  kae = 0.03446450,
},
 },
 [/code]

 and I need to extract confident , ashahvasahta from the first
 record, consumers,  upabhaokahtaa from the second record...
 i.e. word in english and the first word in the probable-translations

 Thanks is advance
 Eknath
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers


Since I hadn't had a chance to write a recursive descent parser, took this
opportunity to do a bit of an exercise.
I have used a parser called pyparsing.

-- Begin Code --
# coding=utf-8
from pyparsing import *
import pprint
import sys

data = '''
confident = {
   count = 4,
   trans = {
 ashahvasahta = 0.74918568,
 atahmavaishahvaasa = 0.09095465,
 pahraaram\.nbha = 0.06990729,
 mailatae = 0.02856427,
 utanai = 0.01929341,
 anaa = 0.01578552,
 uthaanae = 0.01403157,
 jaitanae = 0.01227762,
   },
},
consumers = {
 count = 4,
 trans = {
   upabhaokahtaa = 0.75144362,
   upabhaokahtaaom\.n = 0.12980166,
   sauda\�\�\�dha = 0.11875471,
   },
},
a = {
 count = 1164,
 trans = {
 eka = 0.14900491,
  kaisai = 0.08834675,
haai = 0.06774697,
kaoi = 0.05394308,
 kai = 0.04981982,
\(none\) = 0.04400085,
 kaa = 0.03726579,
 kae = 0.03446450,
   },
}
'''

# Setup pyparsing tokens
dct = Forward()
pair_op = Literal(=)
comma = Literal(,).suppress()
beg_brace = Literal({).suppress()
end_brace = Literal(}).suppress()
num = Word(0123456789.)
key = (Word(alphas + nums) ^ quotedString).setResultsName(key)
val = (num ^ dct).setResultsName(value)
key_value_pair = Group(key + pair_op.suppress() + val)
key_value_pair_list = delimitedList(key_value_pair)
dct  Group(beg_brace + key_value_pair_list + Optional(comma) + end_brace)

# parse data
parsed = key_value_pair_list.parseString(data)

# function to extract ie. form a python datastructure
def extract(result):
if 'key' in result.keys() :
if isinstance(result.value,ParseResults) :
return ( result.key,  extract(result.value) )
else :
return ( result.key,  result.value )
else :
return(dict(extract(elem) for elem in result))

# extract
extracted = extract(parsed)

# print extracted data
pprint.pprint(extracted, sys.stdout)

# print the english word and first translated word

print \n\n\nTranslations\n\n
print dict(
(english,
 reduce(lambda x,y : (y[0],float(y[1])) if float(y[1])  x[1] else x
,
translations['trans'].items(),
('',0.0))[0]
) for english,translations in extracted.items()
)

-- End Code --

Dhananjay

-- 

blog: http://blog.dhananjaynene.com
twitter: http://twitter.com/dnene http://twitter.com/_pythonic
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


[BangPypers] How should I do it?

2010-01-14 Thread Eknath Venkataramani
I have a txt file in the following format:
[code]
confident = {
  count = 4,
  trans = {
 ashahvasahta = 0.74918568,
atahmavaishahvaasa = 0.09095465,
pahraaram\.nbha = 0.06990729,
 mailatae = 0.02856427,
   utanai = 0.01929341,
 anaa = 0.01578552,
 uthaanae = 0.01403157,
 jaitanae = 0.01227762,
},
},
consumers = {
  count = 4,
  trans = {
upabhaokahtaa = 0.75144362,
upabhaokahtaaom\.n = 0.12980166,
sauda\�\�\�dha = 0.11875471,
},
},
a = {
  count = 1164,
  trans = {
  eka = 0.14900491,
   kaisai = 0.08834675,
 haai = 0.06774697,
 kaoi = 0.05394308,
  kai = 0.04981982,
 \(none\) = 0.04400085,
  kaa = 0.03726579,
  kae = 0.03446450,
},
},
[/code]

and I need to extract confident , ashahvasahta from the first
record, consumers,  upabhaokahtaa from the second record...
i.e. word in english and the first word in the probable-translations

Thanks is advance
Eknath
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] How should I do it?

2010-01-14 Thread Dhananjay Nene
This seems to be an output of print_r of PHP. If you have a flexibility, try
to have the PHP code output the data into a language neutral format (eg
json, yaml, xml etc.) and then parse it in python using the appropriate
parser. If not you may have to write a custom parser. I did google to find
if one existed, but couldn't easily locate one.

Dhananjay

On Fri, Jan 15, 2010 at 12:01 PM, Eknath Venkataramani 
eknath.i...@gmail.com wrote:

 I have a txt file in the following format:
 [code]
 confident = {
  count = 4,
  trans = {
 ashahvasahta = 0.74918568,
atahmavaishahvaasa = 0.09095465,
pahraaram\.nbha = 0.06990729,
 mailatae = 0.02856427,
   utanai = 0.01929341,
 anaa = 0.01578552,
 uthaanae = 0.01403157,
 jaitanae = 0.01227762,
},
 },
 consumers = {
  count = 4,
  trans = {
upabhaokahtaa = 0.75144362,
upabhaokahtaaom\.n = 0.12980166,
sauda\�\�\�dha = 0.11875471,
},
 },
 a = {
  count = 1164,
  trans = {
  eka = 0.14900491,
   kaisai = 0.08834675,
 haai = 0.06774697,
 kaoi = 0.05394308,
  kai = 0.04981982,
 \(none\) = 0.04400085,
  kaa = 0.03726579,
  kae = 0.03446450,
},
 },
 [/code]

 and I need to extract confident , ashahvasahta from the first
 record, consumers,  upabhaokahtaa from the second record...
 i.e. word in english and the first word in the probable-translations

 Thanks is advance
 Eknath
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 

blog: http://blog.dhananjaynene.com
twitter: http://twitter.com/dnene http://twitter.com/_pythonic
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers