On Fri, Jan 15, 2010 at 2:17 PM, Noufal Ibrahim <nou...@gmail.com> wrote:

> On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene <dhananjay.n...@gmail.com
> >wrote:
>
> > This seems to be an output of print_r of PHP. If you have a flexibility,
> > try
> > to have the PHP code output the data into a language neutral format (eg
> > json, yaml, xml etc.) and then parse it in python using the appropriate
> > parser. If not you may have to write a custom parser. I did google to
> find
> > if one existed, but couldn't easily locate one.
> >
>
>
> There is
>  http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards
> has json part of the stdlib.
>
> If you don't have access to the webserver, you might be able to use the php
> interpreter on your own machine to parse this into something more language
> neutral
>

 If you take a look at your data, it is surprisingly close to how
 a nested Python dictionary will look like, except that instead of
 ':' to separate key from value, it uses '=>', which is what Perl
 and PHP uses.

 So, the following solution takes advantage of this fact and
 converts your data to a Python dictionary.

 Here is the complete solution.

def scrub(data):
    # First replace [code][/code] parts
    data = data.replace('[code]','').replace('[/code]','')
    # Replace '=>' with ':'
    data = data.replace('=>',':')
    # Now, count and trans are not strings in
    # data, so Python will complain, hence we
    # define these as strings with same name!
    count, trans = 'count','trans'

    # Now prefix data with { and post-fix with }
    data = '{' + data + '}'
    print data

    # Eval it to a dictionary
    mydict = eval(data)
    print mydict

if __name__ == "__main__":
    scrub(open('data.txt').read())

 And it neatly prints as,


{'a': {'count': 1164, 'trans': {'kaoi': 0.053943079999999997, 'kaa':
0.03726579, 'haai': 0.067746970000000004, 'kaisai': 0.088346750000000002,
'kae': 0.034464500000000002, 'kai': 0.049819820000000001, 'eka':
0.14900490999999999, '\\(none\\)': 0.044000850000000001}}, 'confident':
{'count': 4, 'trans': {'mailatae': 0.028564269999999999, 'ashahvasahta':
0.74918567999999996, 'anaa': 0.015785520000000001, 'jaitanae': 0.01227762,
'pahraaram\\.nbha': 0.069907289999999997, 'utanai': 0.01929341,
'atahmavaishahvaasa': 0.090954649999999998, 'uthaanae': 0.01403157}},
'consumers': {'count': 4, 'trans':
{'sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha': 0.11875471,
'upabhaokahtaa': 0.75144361999999998, 'upabhaokahtaaom\\.n':
0.12980166000000001}}}

 Now, use the data as a Python dictionary.

It is a clever hack, taking advantage of the nature of the data. But
it is far more faster than the other approaches posted here.

--Anand


>
>
> --
> ~noufal
> http://nibrahim.net.in
> _______________________________________________
> BangPypers mailing list
> BangPypers@python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>



-- 
--Anand
_______________________________________________
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Reply via email to