Re: Implementing file reading in C/Python

2009-01-23 Thread mk
John Machin wrote: The factor of 30 indeed does not seem right -- I have done somewhat similar stuff (calculating Levenshtein distance [edit distance] on words read from very large files), coded the same algorithm in pure Python and C++ (using linked lists in C++) and Python version was 2.5

Re: Implementing file reading in C/Python

2009-01-14 Thread David Bolen
Johannes Bauer dfnsonfsdu...@gmx.de writes: Yup, I changed the Python code to behave the same way the C code did - however overall it's not much of an improvement: Takes about 15 minutes to execute (still factor 23). Not sure this is completely fair if you're only looking for a pure Python

Re: Implementing file reading in C/Python

2009-01-13 Thread Marc 'BlackJack' Rintsch
On Mon, 12 Jan 2009 21:26:27 -0500, Steve Holden wrote: The very idea of mapping part of a process's virtual address space onto an area in which low-level system code resides, so writing to this region may corrupt the system, with potentially catastrophic consequences seems to be asking for

Re: Implementing file reading in C/Python

2009-01-12 Thread Sion Arrowsmith
Grant Edwards inva...@invalid wrote: On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: If I were you, I'd try mmap()ing the file instead of reading it into string objects one chunk at a time. You've snipped the bit further on in that

Re: Implementing file reading in C/Python

2009-01-12 Thread sturlamolden
On Jan 9, 6:41 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: You've snipped the bit further on in that sentence where the OP says that the file of interest is 2GB. Do you still want to try mmap'ing it? Python's mmap object does not take an offset parameter. If it did, one could mmap

Re: Implementing file reading in C/Python

2009-01-12 Thread Sion Arrowsmith
In case the cancel didn't get through: Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: 2GB should easily fit within the process's virtual memory space. Assuming you're in a 64bit world. Me, I've only got 2GB of address space available to play in --

Re: Implementing file reading in C/Python

2009-01-12 Thread sturlamolden
On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: And today's moral is: try it before posting. Yeah, I can map a 2GB file no problem, complete with associated 2GB+ allocated VM. The addressing is clearly not working how I was expecting it too. The virtual memory space of

Re: Implementing file reading in C/Python

2009-01-12 Thread Hrvoje Niksic
sturlamolden sturlamol...@yahoo.no writes: On Jan 9, 6:41 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: You've snipped the bit further on in that sentence where the OP says that the file of interest is 2GB. Do you still want to try mmap'ing it? Python's mmap object does not take

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-12, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: If I were you, I'd try mmap()ing the file instead of reading it into string objects

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-12, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: In case the cancel didn't get through: Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: 2GB should easily fit within the process's virtual memory space. Assuming you're in a 64bit world.

Re: Implementing file reading in C/Python

2009-01-12 Thread Steve Holden
sturlamolden wrote: On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: And today's moral is: try it before posting. Yeah, I can map a 2GB file no problem, complete with associated 2GB+ allocated VM. The addressing is clearly not working how I was expecting it too.

Re: Implementing file reading in C/Python

2009-01-12 Thread Steve Holden
sturlamolden wrote: On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: And today's moral is: try it before posting. Yeah, I can map a 2GB file no problem, complete with associated 2GB+ allocated VM. The addressing is clearly not working how I was expecting it too.

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-13, Steve Holden st...@holdenweb.com wrote: sturlamolden wrote: On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: And today's moral is: try it before posting. Yeah, I can map a 2GB file no problem, complete with associated 2GB+ allocated VM. The addressing

Re: Implementing file reading in C/Python

2009-01-10 Thread Francesco Bochicchio
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: Marc 'BlackJack' Rintsch wrote: On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: As this was horribly slow (20 Minutes for a 2GB file) I coded the whole thing in C also: Yours took ~37 minutes for 2 GiB here. This just ~15

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: I've first tried Python. Please don't beat me, it's slow as hell and probably a horrible solution: #!/usr/bin/python import sys import os f = open(sys.argv[1], r) Mode should be 'rb'. filesize = os.stat(sys.argv[1])[6]

Re: Implementing file reading in C/Python

2009-01-09 Thread James Mills
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: print(Filesize : %d % (filesize)) print(Image size : %dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize)) Why parentheses around ``print``\s argument? In Python 3 ``print`` is a statement

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: Why parentheses around ``print``\s argument? In Python 3 ``print`` is a statement and not a function. Not true as of 2.6+ and 3.0+ print is now a

Re: Implementing file reading in C/Python

2009-01-09 Thread James Mills
On Fri, Jan 9, 2009 at 7:41 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: Please read again what I wrote. Lol I thought 3 was a smiley! :) Sorry! cheers James -- http://mail.python.org/mailman/listinfo/python-list

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: datamap = { } for i in range(len(data)): datamap[ord(data[i])] = datamap.get(data[i], 0) + 1 Here is an error by the way: You call `ord()` just on the left side of the ``=``, so all keys in the dictionary

Re: Implementing file reading in C/Python

2009-01-09 Thread Steven D'Aprano
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: print(Filesize : %d % (filesize)) print(Image size : %dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize)) Why parentheses around

Re: Implementing file reading in C/Python

2009-01-09 Thread Steven D'Aprano
On Fri, 09 Jan 2009 09:15:20 +, Marc 'BlackJack' Rintsch wrote: picture = { } havepixels = 0 while True: data = f.read(blocksize) if len(data) = 0: break if data: break is enough. You've reversed the sense of the test. The OP exits the loop when data is

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: As this was horribly slow (20 Minutes for a 2GB file) I coded the whole thing in C also: Yours took ~37 minutes for 2 GiB here. This just ~15 minutes: #!/usr/bin/env python from __future__ import division, with_statement import os

Re: Implementing file reading in C/Python

2009-01-09 Thread Steve Holden
Marc 'BlackJack' Rintsch wrote: On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: [...] print(Filesize : %d % (filesize)) print(Image size : %dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize)) Why parentheses around ``print``\s argument? In Python 3 ``print``

Re: Implementing file reading in C/Python

2009-01-09 Thread Steve Holden
Steven D'Aprano wrote: On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: print(Filesize : %d % (filesize)) print(Image size : %dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize)) Why

Re: Implementing file reading in C/Python

2009-01-09 Thread mk
Johannes Bauer wrote: Which takes about 40 seconds. I want the niceness of Python but a little more speed than I'm getting (I'd settle for factor 2 or 3 slower, but factor 30 is just too much). This probably doesn't contribute much, but have you tried using Python profiler? You might have

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
Marc 'BlackJack' Rintsch schrieb: f = open(sys.argv[1], r) Mode should be 'rb'. Check. filesize = os.stat(sys.argv[1])[6] `os.path.getsize()` is a little bit more readable. Check. print(Filesize : %d % (filesize)) print(Image size : %dx%d % (width, height)) print(Bytes per

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
James Mills schrieb: What does this little tool do anyway ? It's very interesting the images it creates out of files. What is this called ? It has no particular name. I was toying around with the Princeton Cold Boot Attack (http://citp.princeton.edu/memory/). In particular I was interested in

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
Marc 'BlackJack' Rintsch schrieb: On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: As this was horribly slow (20 Minutes for a 2GB file) I coded the whole thing in C also: Yours took ~37 minutes for 2 GiB here. This just ~15 minutes: Ah, ok... when implementing your suggestions

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
mk schrieb: Johannes Bauer wrote: Which takes about 40 seconds. I want the niceness of Python but a little more speed than I'm getting (I'd settle for factor 2 or 3 slower, but factor 30 is just too much). This probably doesn't contribute much, but have you tried using Python profiler?

Re: Implementing file reading in C/Python

2009-01-09 Thread pruebauno
On Jan 9, 8:48 am, Johannes Bauer dfnsonfsdu...@gmx.de wrote: No - and I've not known there was a profiler yet have found anything meaningful (there seems to be an profiling C interface, but that won't get me anywhere). Is that a seperate tool or something? Could you provide a link? Thanks,

Re: Implementing file reading in C/Python

2009-01-09 Thread MRAB
Marc 'BlackJack' Rintsch wrote: On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: As this was horribly slow (20 Minutes for a 2GB file) I coded the whole thing in C also: Yours took ~37 minutes for 2 GiB here. This just ~15 minutes: #!/usr/bin/env python from __future__ import

Re: Implementing file reading in C/Python

2009-01-09 Thread rurpy
On Jan 9, 6:48 am, Johannes Bauer dfnsonfsdu...@gmx.de wrote: mk schrieb: The factor of 30 indeed does not seem right -- I have done somewhat similar stuff (calculating Levenshtein distance [edit distance] on words read from very large files), coded the same algorithm in pure Python and

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote: I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task: Get a file, If I were you, I'd try

Re: Implementing file reading in C/Python

2009-01-09 Thread bearophileHUGS
Johannes Bauer, I was about to start writing a faster version. I think with some care and Psyco you can go about as 5 times slower than C or something like that. To do that you need to use almost the same code for the C version, with a list of 256 ints for the frequencies, not using max() but a

Re: Implementing file reading in C/Python

2009-01-09 Thread Sion Arrowsmith
Grant Edwards inva...@invalid wrote: On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote: I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task:

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote: Grant Edwards inva...@invalid wrote: On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote: I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: Marc 'BlackJack' Rintsch wrote: def iter_max_values(blocks, block_count): for i, block in enumerate(blocks): histogram = defaultdict(int) for byte in block: histogram[byte] += 1 yield

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: Marc 'BlackJack' Rintsch wrote: def iter_max_values(blocks, block_count): for i, block in enumerate(blocks): histogram = defaultdict(int) for byte in block:

Re: Implementing file reading in C/Python

2009-01-09 Thread John Machin
On Jan 9, 9:56 pm, mk mrk...@gmail.com wrote: The factor of 30 indeed does not seem right -- I have done somewhat similar stuff (calculating Levenshtein distance [edit distance] on words read from very large files), coded the same algorithm in pure Python and C++ (using linked lists in C++)

Re: Implementing file reading in C/Python

2009-01-09 Thread Rhamphoryncus
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: Marc 'BlackJack' Rintsch wrote: def iter_max_values(blocks, block_count):     for i, block in enumerate(blocks):         histogram = defaultdict(int)         for byte in

Implementing file reading in C/Python

2009-01-08 Thread Johannes Bauer
Hello group, I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task: Get a file, split it up into a million chunks, count the most prominent character in

Re: Implementing file reading in C/Python

2009-01-08 Thread MRAB
Johannes Bauer wrote: Hello group, I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task: Get a file, split it up into a million chunks, count the most

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 1:04 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: Hello group, Hello. (...) Which takes about 40 seconds. I want the niceness of Python but a little more speed than I'm getting (I'd settle for factor 2 or 3 slower, but factor 30 is just too much). Can anyone point

Re: Implementing file reading in C/Python

2009-01-08 Thread Johannes Bauer
James Mills schrieb: I have tested this against a randomly generated file from /dev/urandom (10M). Yes the Python one is much slower, but I believe it's bebcause the Python implementation is _correct_ where teh C one is _wrong_ :) The resulting test.bin.pgm from python is exactly 3.5M

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 3:13 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote: Uhh, yes, you're right there... I must admit that I was too lazy to include all the stat headers and to a proper st_size check in the C version (just a quick hack), so it's practically hardcoded. With files of exactly

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 2:29 PM, James Mills prolo...@shortcircuit.net.au wrote: I shall attempt to optimize this :) I have a funny feeling you might be caught up with some features of Python - one notable one being that some things in Python are immutable. psyco might help here though ...

Re: Implementing file reading in C/Python

2009-01-08 Thread Steve Holden
MRAB wrote: Johannes Bauer wrote: Hello group, [and about 200 other lines there was no need to quote] [...] Have a look at psyco: http://psyco.sourceforge.net/ Have a little consideration for others when making a short reply to a long post, please. Trim what isn't necessary. Thanks. regards