Re: FYI: ConfigParser, ordered options, PEP 372 and OrderedDict + big thank you

2009-11-20 Thread Scott David Daniels

Jonathan Fine wrote:...

A big thanks to Armin Ronacher and Raymond Hettinger for
   PEP 372: Adding an ordered dictionary to collections
...  I prototyped (in about an hour).

I then thought - maybe someone has been down this path before

So all that I want has been done already, and will be waiting for me 
when I move to Python3.


So a big thank you is in order.


And thank you for, having done that, not simply smiling because your
work was lighter.  Instead you described a great work path and handed
an attaboy to a pair of people that richly deserve attaboys.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Writing a Carriage Return in Unicode

2009-11-20 Thread Scott David Daniels

MRAB wrote:

u'\u240D' isn't a carriage return (that's u'\r') but a symbol (a visible
CR graphic) for carriage return. Windows programs normally expect
lines to end with '\r\n'; just use u'\n' in programs and open the text
files in text mode ('r' or 'w').


rant
This is the one thing from standards that I believe Microsoft got right
where others did not.  The ASCII (American Standard for Information
Interchange) standard end of line is _both_ carriage return (\r) _and_
line feed (\n) -- I believe in that order.

The Unix operating system, in its enthusiasm to make _everything_
simpler (against Einstein's advice, Everything should be made as simple
as possible, but not simpler.) decided that end-of-line should be a
simple line feed and not carriage return line feed.  Before they made
that decision, there was debate about the order of cr-lf or lf-cr, or
inventing a new EOL character ('\037' == '\x1F' was the candidate).

If you've actually typed on a physical typewriter, you know that moving
the carriage back is a distinct operation from rolling the platen
forward; both operations are accomplished when you push the carriage
back using the bar, but you know they are distinct.  Hell, MIT even had
line starve character that moved the cursor up (or rolled the platen
back).
/rant

Lots of people talk about dos-mode files and windows files as if
Microsoft got it wrong; it did not -- Unix made up a convenient fiction
and people went along with it. (And, yes, if Unix had been there first,
their convention was, in fact, better).

So, sorry for venting, but I have bee wanting to say this in public
for years.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: overriding __getitem__ for a subclass of dict

2009-11-17 Thread Scott David Daniels

Steve Howell wrote:
...

Eventually, I realized that it was easier to just monkeypatch Django
while I was in test mode to get a more direct hook into the behavior I
was trying to monitor, and then I didn't need to bother with
overriding __getitem__ or creating complicated wrapper objects


Since nobody else has mentioned it, I'd point you at Mock objects:
http://python-mock.sourceforge.net/
for another way to skin the cat that it sounds like has been
biting you.  They are surprisingly useful for exploratory
and regression testing.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: python gui builders

2009-11-17 Thread Scott David Daniels

me wrote:

I have looked at the Tk stuff that is built into Python - not 
acceptable. 

Such insightful analysis, and it is _so_ helpful in stating your needs.


[a lot of guff about unacceptable things]


What Python gui builder is well supported, does not require me to learn 
another framework/library, and can crank out stuff for multiple platforms ?


Well, let's see.  You want to do gui work without learning things.
Good luck with that.  If you discover how, I'd like to learn tensor
analysis without using symbols or operations more complex than
addition and subtraction.  Maybe your groundwork can help me out
with that.

I must be in a really cranky mood today.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: ZipFile - file adding API incomplete?

2009-11-17 Thread Scott David Daniels

Glenn Maynard wrote:

I want to do something fairly simple: read files from one ZIP and add
them to another, so I can remove and replace files.  This led me to a
couple things that seem to be missing from the API.

 zip.write() only takes the filename and
compression method, not a ZipInfo; writestr takes a ZipInfo but only
accepts a string, not a file.  Is there an API call I'm missing?
(This seems like the fundamental API for adding files, that write and
writestr should be calling.)


Simple answer: its not there in the API.

Defining that API correctly is tricky, and fraught with issues about
access to the ZipFile object (from both the same thread and from other
threads) while it is mid-modification.  Nonetheless, a carefully done
API that addresses those issues would be valuable.  If you do spend the
time to get something reliable going, put it someplace public and I
predict it will get use.

The approach I fiddled with was:
* Define a calls to read _portions_ of the raw (compressed,
  encrypted, whatever) data.
* Define a call that locks the ZipFile object and returns a
  write handle for a single new file.  At that point the
  new file doesn't exist, but reading of other portions of
  the zip file are allowed.
* Only on successful close of the write handle is the new
  directory written.
Unfortunately, I never worked very hard at the directory entries,
and I realize that the big flaw in this design is that from the moment 
you start overwriting the existing master directory until you write

a new master at the end, your do not have a valid zip file.

Also note that you'll have to research standards about _exactly_ what
the main header should look like if you use particular features.  My
stuff did bzip compression as well, and about the find which bits
means what was where my process broke down.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: TODO and FIXME tags

2009-11-17 Thread Scott David Daniels

Martin P. Hellwig wrote:

Ben Finney wrote:

Chris Rebert c...@rebertia.com writes:


2009/11/16 Yasser Almeida Hernández pedro...@fenhi.uh.cu:

How is the sintaxis for set the TODO and FIXME tags...?

...

There's no widely-followed “syntax” for this convention, though.
Except for _not_ doing what is suggested in those comments, which 
appears to be the biggest convention :-)


Perhaps: The comments are a directive to delete the comment if
you happen do this.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Serious Privileges Problem: Please Help

2009-11-10 Thread Scott David Daniels

Dave Angel wrote:

Victor Subervi wrote:
On Mon, Nov 9, 2009 at 2:30 PM, Victor Subervi 
victorsube...@gmail.comwrote:
On Mon, Nov 9, 2009 at 2:27 PM, Rami Chowdhury 
rami.chowdh...@gmail.comwrote:

snip

Hold everything. Apparently line-endings got mangled. What I don't

...
 
What I've diagnosed as happening when a python script with Windows 
line-ending was posted on my server's cgi environment:


The actual error seemed to be a failure to find the python interpreter, 
since some Unix shells take the shebang line to include the \r character 
that preceded the newline.   Seems to me they could be more tolerant, 
since I don't think control characters are likely in the interpreter 
file name.


You could work around this by creating a symlink (or even hard link to
the python executable named python\r

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: list comprehension problem

2009-11-03 Thread Scott David Daniels

Terry Reedy wrote:
What immutability has to do with identity is that 'two' immutable 
objects with the same value *may* actually be the same object, 
*depending on the particular version of a particular implementation*.





t1 = (1,2,3) # an immutable object
t2 = (1,2,3) # another immutable object


Whether or not this is 'another' object or the same object is irrelevant 
for all purposes except identity checking. It is completely up to the 
interpreter.



t1 is t2

False


In this case, but it could have been True.


t1 == t2

True


A more telling example:
 t1 = (1, 2) + (3,) # an immutable object
 t2 = (1,) + (2, 3) # another immutable object
 t1 is t2
 False
 t1 is t2
 False

Here you make obvious that (assuming an optimizer that
is not far more aggressive than Python is used to), in
order to make equal immutable values identical, you'd
have to end each operation producing an immutable result
with a search of all appropriately typed values for one
that was equal.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Another (simple) unicode question

2009-10-29 Thread Scott David Daniels

John Machin wrote:

On Oct 29, 10:02 pm, Rustom Mody rustompm...@gmail.com wrote:...

I thought of trying to port it to python3 but it barfs on some unicode
related stuff (after running 2to3) which I am unable to wrap my head
around.

Can anyone direct me to what I should read to try to understand this?


to which Jon replied with some good links to start, and then:


In any case, it's a debugging problem, isn't it? Could you possibly
consider telling us the error message, the traceback, a few lines of
the 3.x code around where the problem is, and the corresponding 2.x
lines? Are you using 3.1.1 and 2.6.4? Does your test work in 2.6?


Also consider how 2to3 translates the problem section(s).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: lambda forms within a loop

2009-10-25 Thread Scott David Daniels

Michal Ostrowski wrote:

...
[a,b] = MakeLambdawhatever()
print a(10)
print b(10)


Here is yet another way to solve the problem:

import functools
def AddPair(x, q):
return x + q
a, b = [functools.partial(AddPair, x) for x in [1, 2]]
print a(10)
print b(10)

Or even, since these are numbers:
a, b = [x.__add__ for x in [1, 2]]
print a(10)
print b(10)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: IDLE python shell freezes after running show() of matplotlib

2009-10-24 Thread Scott David Daniels

Forrest Sheng Bao wrote:

I am having a weird problem on IDLE. After I plot something using show
() of matplotlib, the python shell prompt in IDLE just freezes that I
cannot enter anything and there is no new  prompt show up. I
tried ctrl - C and it didn't work. I have to restart IDLE to use it
again.

My system is Ubuntu Linux 9.04. I used apt-get to install IDLE.


You should really look at smart questions; I believe you have a problem,
and that you have yet to imagine how to give enough information for
someone else to help you.

http://www.catb.org/~esr/faqs/smart-questions.html

Hint: I don't know your CPU, python version, IDLE version, matplotlib
version, nor do you provide a small code example that allows me to
easily reproduce your problem (or not).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] Python(x,y) 2.6.3.0 released

2009-10-22 Thread Scott David Daniels

Pierre Raybaut wrote:

Hi all,

I'm quite pleased (and relieved) to announce that Python(x,y) version 
2.6.3.0 has been released. It is the first release based on Python 2.6 
-- note that Python(x,y) version number will now follow the included 
Python version (Python(x,y) vX.Y.Z.N will be based on Python vX.Y.Z).


Python(x,y) is a free Python distribution providing a ready-to-use 
scientific development software for numerical computations, data 
analysis and data visualization based on Python programming language, Qt 
graphical user interfaces (and development framework), Eclipse 
integrated development environment and Spyder interactive development 
environment. Its purpose is to help scientific programmers used to 
interpreted languages (such as MATLAB or IDL) or compiled languages 
(C/C++ or Fortran) to switch to Python.


It is now available for Windows XP/Vista/7 (as well as for Ubuntu 
through the pythonxy-linux project -- note that included software may 
differs from the Windows version):

http://www.pythonxy.com

Major changes since v2.1.17:
   * Python 2.6.3
   * Spyder 1.0.0 -- the Scientific PYthon Development EnviRonment, a 
powerful MATLAB-like development environment introducing exclusive 
features in the scientific Python community 
(http://packages.python.org/spyder/)

   * MinGW 4.4.0 -- including gcc 4.4.0 and gfortran
   * Pydev 1.5.0 -- now including the powerful code analysis features of 
Pydev Extensions (formerly available as a commercial extension to the 
free Pydev plugin)

   * Enthought Tool Suite 3.3.0
   * PyQt 4.5.4 and PyQwt 5.2.0
   * VTK 5.4.2
   * ITK 3.16 -- Built for Python 2.6 thanks to the help of Charl Botha, 
DeVIDE (Delft Visualisation and Image processing Development Environment)


Complete release notes:
http://www.pythonxy.com/download.php

- Pierre


The really sad part is that you'll have to do 2.6.4.0 so soon.
Actually, it is not so sad, since so little has changed (except,
probably) the bits you have been struggling with.  Please _do_
check out the release candidate soonest (since it will become
production _very_ soon) -- get to python dev immediately if
you have problems with the release candidate.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: a simple unicode question

2009-10-21 Thread Scott David Daniels

George Trojan wrote:

Scott David Daniels wrote:

...

And if you are unsure of the name to use:
  import unicodedata
  unicodedata.name(u'\xb0')
'DEGREE SIGN'


 Thanks for all suggestions. It took me a while to find out how to
 configure my keyboard to be able to type the degree sign. I prefer to
 stick with pure ASCII if possible.
 Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found
 http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
 Is that the place to look?

I thought the mention of unicodedata would make it clear.

 for n in xrange(sys.maxunicode+1):
try:
nm = unicodedata.name(unichr(n))
except ValueError: pass
else:
if 'tortoise' in nm.lower(): print n, nm


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: a simple unicode question

2009-10-20 Thread Scott David Daniels

Mark Tolonen wrote:

Is there a better way of getting the degrees?


It seems your string is UTF-8.  \xc2\xb0 is UTF-8 for DEGREE SIGN.  If 
you type non-ASCII characters in source code, make sure to declare the 
encoding the file is *actually* saved in:


# coding: utf-8

s = '''48° 13' 16.80 N'''
q = s.decode('utf-8')

# next line equivalent to previous two
q = u'''48° 13' 16.80 N'''

# couple ways to find the degrees
print int(q[:q.find(u'°')])
import re
print re.search(ur'(\d+)°',q).group(1)



Mark is right about the source, but you needn't write unicode source
to process unicode data.  Since nobody else mentioned my favorite way
of writing unicode in ASCII, try:

IDLE 2.6.3
 s = '''48\xc2\xb0 13' 16.80 N'''
 q = s.decode('utf-8')
 degrees, rest = q.split(u'\N{DEGREE SIGN}')
 print degrees
48
 print rest
 13' 16.80 N

And if you are unsure of the name to use:
 import unicodedata
 unicodedata.name(u'\xb0')
'DEGREE SIGN'

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: No module named os

2009-10-10 Thread Scott David Daniels

smi...@home.com wrote:

'import site' failed; use -v for traceback
Traceback (most recent call last):
  File ./setup.py, line 3, in module
import sys, os, glob
ImportError: No module named os


I'm trying to build a small program and I get the above error.
I have had this error popup in the past while trying to build other
programs. What can I do?

Thanks


Go to a command line and type:
$ python -v setup.py
which will tell you which includes are tried in which order.
If this doesn't make it painfully obvious, try:
$ python -v -v setup.py
which will tell you what locations are being checked for files.

Normally you should:
  1) tell us python version and which OS (and OS version) you are using.
  2) include a pasted copy of exactly what did not work, along with the
 resulting output, and why you did not expect the output you got.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: When ‘super’ is not a good idea

2009-10-07 Thread Scott David Daniels

Ben Finney wrote:

Scott David Daniels wrote: ...



class Initialized(ClassBase):
@classmethod
def _init_class(class_):
class_.a, class_.b = 1, 2
super(Initialized, class_)._init_class()
Mea culpa:  Here super is _not_ a good idea, 

[…]
Why is ‘super’ not a good idea here?

class Initialized(ClassBase):
@classmethod
def _init_class(class_):
class_.a, class_.b = 1, 2
ClassBase._init_class()

What makes this implementation better than the one using ‘super’?


Well, it doesn't end with an error message :-)

The reason for the error message is that super is built for instance
methods, not class methods.  You'd need a class method style super
to get to the next superclass in the __mro__ with an '_init_class'
method.   Personally I don't see the need.
You could of course do it like this:

class MyOtherType(type):
def __new__(class_, name, bases, dct):
result = type.__new__(class_, name, bases, dct)
result()._init_class()
return result

class OtherClassBase(object):
__metaclass__ = MyOtherType

def _init_class(self):
print 'initializing class'

class Initialized(OtherClassBase):
def _init_class(self):
self.__class__.a, self.__class__.b = 1, 2
super(Initialized, self)._init_class()

This code is a problem because the point of this exercise is to do
initialization _before_ building an instance (think of building tables
used in __init__).

Before you decide that super should simply check if the second arg to
super is a subclass of the first arg, and operate differently in that
case (as my first code naively did), realize there is a problem.  I saw
the problem in trying the code, and simply tacked in the proper parent
call and ran off to work.

Think about the fact that classes are now objects as well; a class
itself has a class (type or in these classes MyType or MyOtherType)
with its own needs for super, and the combination would be a mess.
I'm certain you'd get inadvertent switches across the two subtype
hierarchies, but that belief may just be my fear of the inevitable
testing and debugging issues such an implementation would require.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rules regarding a post about a commercial product

2009-10-07 Thread Scott David Daniels

Ken Elkabany wrote:

I am hoping to get feedback for a new, commercial platform that
targets the python programming language and its users. The product is
currently in a closed-beta and will be free for at least a couple
months. After reviewing the only rules I could find
(http://www.python.org/community/lists/), I wanted to ask one last
time to make sure that such a post would be appropriate.


You might want to go for comp.lang.python.announce
I am certain you are welcome if you don't spray the area with ads,
see, for example, ActiveState's behavior.  I trust that if you so
start making real money from it, like ActiveState you'll help out
the community that is giving you its support.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: 'Once' properties.

2009-10-06 Thread Scott David Daniels

menomnon wrote:

Does python have a ‘once’ (per class) feature?

‘Once’, as I’ve know it is in Eiffel.  May be in Java don’t.

The first time you instantiate a given class into an object it
constructs, say, a dictionary containing static information.  In my
case static is information that may change once a week at the most and
there’s no need to be refreshing this data during a single running of
the program (currently maybe 30 minutes).

So you instantiate the same class into a second object, but instead of
going to the databases again and recreating the same dictionary a
second time, you get a pointer or reference to the one already created
in the first object – copies into the second object that is.  And the
dictionary, no matter how many instances of the object you make, is
always the same one from the first object.

So, as we put it, once per class and not object.

Saves on both time and space.

Look into metaclasses:

class MyType(type):
def __new__(class_, name, bases, dct):
result = type.__new__(class_, name, bases, dct)
result._init_class()
return result

class ClassBase(object):
__metaclass__ = MyType

@classmethod
def _init_class(class_):
print 'initializing class'


class Initialized(ClassBase):
@classmethod
def _init_class(class_):
class_.a, class_.b = 1, 2
super(Initialized, class_)._init_class()

print Initialized.a, Initialized.b

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: 'Once' properties.

2009-10-06 Thread Scott David Daniels

Scott David Daniels wrote:

...
Look into metaclasses:

...

class Initialized(ClassBase):
@classmethod
def _init_class(class_):
class_.a, class_.b = 1, 2
super(Initialized, class_)._init_class()


Mea culpa:  Here super is _not_ a good idea, and I had tried that
and recoded, but cut and pasted the wrong code.  I just noticed
that I had done so this morning.

class Initialized(ClassBase):
@classmethod
def _init_class(class_):
class_.a, class_.b = 1, 2
ClassBase._init_class()

print Initialized.a, Initialized.b

Much better.  There is probably a way to get to the MRO, but for now,
this should do.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: PIL : How to write array to image ???

2009-10-06 Thread Scott David Daniels

Mart. wrote:

On Oct 5, 5:14 pm, Martin mar...@hvidberg.net wrote:

On Oct 4, 10:16 pm, Mart. mdeka...@gmail.com wrote:






On Oct 4, 9:47 am, Martin mar...@hvidberg.net wrote:

On Oct 3, 11:56 pm, Peter Otten __pete...@web.de wrote:

Martin wrote:

Dear group
I'm trying to use PIL to write an array (a NumPy array to be exact) to
an image.
Peace of cake, but it comes out looking strange.
I use the below mini code, that I wrote for the purpose. The print of
a looks like expected:
[[ 200.  200.  200. ...,0.0.0.]
 [ 200.  200.  200. ...,0.0.0.]
 [ 200.  200.  200. ...,0.0.0.]
 ...,
 [   0.0.0. ...,  200.  200.  200.]
 [   0.0.0. ...,  200.  200.  200.]
 [   0.0.0. ...,  200.  200.  200.]]
But the image looks nothing like that.
Please see the images on:
http://hvidberg.net/Martin/temp/quat_col.png
http://hvidberg.net/Martin/temp/quat_bw.png
or run the code to see them locally.
Please – what do I do wrong in the PIL part ???
:-? Martin
import numpy as np
from PIL import Image
from PIL import ImageOps
maxcol = 100
maxrow = 100
a = np.zeros((maxcol,maxrow),float)
for i in range(maxcol):
for j in range(maxrow):
if (i(maxcol/2) and j(maxrow/2)) or (i=(maxcol/2) and j=
(maxrow/2)):
a[i,j] = 200
else:
a[i,j] = 0
print a
pilImage = Image.fromarray(a,'RGB')
pilImage.save('quat_col.png')
pilImage = ImageOps.grayscale(pilImage)
pilImage.save('quat_bw.png')

The PIL seems to copy the array contents directly from memory without any
conversions or sanity check. In your example The float values determine the
gray value of 8 consecutive pixels.
If you want a[i,j] to become the color of the pixel (i, j) you have to use
an array with a memory layout that is compatible to the Image.
Here are a few examples:

import numpy
from PIL import Image
a = numpy.zeros((100, 100), numpy.uint8)
a[:50, :50] = a[50:, 50:] = 255
Image.fromarray(a).save(tmp1.png)
b = numpy.zeros((100, 100, 3), numpy.uint8)
b[:50, :50, :] = b[50:, 50:, :] = [255, 0, 0]
Image.fromarray(b).save(tmp2.png)
c = numpy.zeros((100, 100), numpy.uint32)
c[:50, :50] = c[50:, 50:] = 0xff808000
Image.fromarray(c, RGBA).save(tmp3.png)

Peter

Thanks All - That helped a lot...
The working code ended with:
imga = np.zeros((imgL.shape[1],imgL.shape[0]),np.uint8)
for ro in range(imgL.shape[1]):
for co in range(imgL.shape[0]):
imga[ro,co] = imgL[ro,co]
Image.fromarray(imga).save('_a'+str(lev)+'.png')

Without knowing how big your image is (can't remember if you said!).
Perhaps rather than looping in the way you might in C for example, the
numpy where might be quicker if you have a big image. Just a
thought...
And a good thought too... 


I think what Martin is telling you is:

Look to numpy to continue working on the array first.

byte_store = imgL.astype(np.uint8)
Image.fromarray(byte_store).save('_a%s.png' % lev)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Q: sort's key and cmp parameters

2009-10-02 Thread Scott David Daniels

Paul Rubin wrote:

I still have never understood why cmp was removed.  Sure, key is more
convenient a lot (or maybe most) of the time, but it's not always.


Not just more convenient.  cmp will always be N log N, in that _every_
comparison runs your function, while key is linear, in that it is run
once per element.  Most cases are moreeasily done with key, and it is
a good idea to make the most accessible way to a sort be the most
efficient one.  In the rare case that you really want each comparison,
the cmp-injection function will do nicely (and can be written as a
recipe.

In short, make the easy path the fast path, and more will use it;
provide two ways, and the first that springs to mind is the one
used.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Idiom for last word in a string

2009-09-29 Thread Scott David Daniels

Grant Edwards wrote:

I recently ran across this construct for grabbing the last
(whitespace delimited) word in a string:
   s.rsplit(None,1)[1]
... I've always done this:
   s.split()[-1]
I was wondering what the advantage of the rsplit(None,1)[1]
approach would be ...

Others have pointed out the efficiency reason (asking the machine
to do a pile of work that you intend to throw away).  But nobody
warned you:
s.rsplit(None, 1)[-1]
would be better in the case of 'single_word'.rsplit(None, 1)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Restarting IDLE without closing it

2009-09-29 Thread Scott David Daniels

candide wrote:

Hi
I was wondering if there exists somme way to clear memory of all objects
created during a current IDLE session (with the same effect as if one
starts an IDLE session). Thanks.

Different than Shell  /  Restart Shell (Ctrl+F6) ?
Of course this doesn't work if you started Idle ith the -n switch.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting changes to a dict

2009-09-28 Thread Scott David Daniels

Steven D'Aprano wrote:
I'm pretty sure the answer to this is No, but I thought I'd ask just in 
case... 
Is there a fast way to see that a dict has been modified? ...


Of course I can subclass dict to do this, but if there's an existing way, 
that would be better.


def mutating(method):
def replacement(self, *args, **kwargs):
try:
return method(self, *args, **kwargs)
finally:
self.serial += 1
replacement.__name__ = method.__name__
return replacement


class SerializedDictionary(dict):
def __init__(self, *arg, **kwargs):
self.serial = 0
super(SerializedDictionary).__init__(self, *arg, **kwargs)

__setitem__ = mutating(dict.__setitem__)
__delitem__ = mutating(dict.__delitem__)
clear = mutating(dict.clear)
pop = mutating(dict.pop)
popitem = mutating(dict.popitem)
setdefault = mutating(dict.setdefault)
update = mutating(dict.update)

d = SerializedDictionary(whatever)

Then just use dict.serial to see if there has been a change.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Want to call a method only once for unittest.TestCase--but not sure how?

2009-09-28 Thread Scott David Daniels

Oltmans wrote:

... All of our unit tests are written using built-in 'unittest'
module. We've a requirement where we want to run a method only once
for our unit tests

 So I'm completely stumped as to how to create a method that will only
 be called only once for Calculator class. Can you please suggest any
 ideas? Any help will be highly appreciated. Thanks in advance.

Just inherit your classes from something like (untested):

class FunkyTestCase(unittest.TestCase):
needs_initial = True

def initialize(self):
self.__class__.needs_initial = False

def setUp(self):
if self.needs_initial:
self.initialize()


And write your test classes like:

class Bump(FunkyTestCase):
def initialize(self):
super(Bump, self).initialize()
print 'One time Action'
...

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: string interpolation mystery in Python 2.6

2009-09-16 Thread Scott David Daniels

Alan G Isaac wrote:

George Brandl explained it to me this way:
It's probably best explained with a bit of code:

 class C(object):
...  def __str__(self): return '[str]'
...  def __unicode__(self): return '[unicode]'
...
 %s %s % ('foo', C())
'foo [str]'
 %s %s % (u'foo', C())
u'foo [unicode]'
 I.e., as soon as a Unicode element is interpolated into 
a string, further interpolations automatically request 
Unicode via __unicode__, if it exists.


Even more fun (until you know what is going on):
 c = C()
 %s %s %s % (c, u'c', c)
u'[str] c [unicode]'

--Scott David Daniels
Scott David dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy NaN, not surviving pickle/unpickle?

2009-09-14 Thread Scott David Daniels

Steven D'Aprano wrote:

On Sun, 13 Sep 2009 17:58:14 -0500, Robert Kern wrote:
Exactly -- there are 2**53 distinct floats on most IEEE systems, the vast 
majority of which might as well be random. What's the point of caching 
numbers like 2.5209481723210079? Chances are it will never come up again 
in a calculation.


You are missing a few orders of magnitude here; there are approx. 2 ** 64
distinct floats.  2 ** 53 is the mantissa of regular floats.  There are
2**52 floats X where 1.0 = X  2.0.
The number of normal floats is 2 ** 64 - 2 ** 52 + 1.
The number including denormals and -0.0 is 2 ** 64 - 2 ** 53.

There are approx. 2 ** 53 NaNs (half with the sign bit on).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiple inheritance - How to call method_x in InheritedBaseB from method_x in InheritedBaseA?

2009-09-11 Thread Scott David Daniels

The Music Guy wrote:
...


def main():
...

class MyMixin(object):

This is a mistake.  If Mixins inherit from CommonBase as well, no
order of class definition can catch you out.
If it doesn't, you can get yourself in trouble.


def method_x(self, a, b, c):
super(MyMixin, self).method_x(a, b, c)
print MyMixin.method_x(%s, %s, %s, %s) % (repr(self),
repr(a), repr(b), repr(c))

class CommonBase(object):
def method_x(self, a, b, c):
print CommonBase.method_x(%s, %s, %s, %s) % (repr(self),
repr(a), repr(b), repr(c))

class BaseA(CommonBase):
...


Redoing this example for small prints:

def main():
for n, class_ in enumerate(
(BaseA, BaseB, BaseC,
 FooV, FooW, FooX, FooY, FooZ,
 BarW, BarX, BarY, BarZ)):
instance = class_()
instance.method_x(n, n * '-', hex(n*13))
print

class CommonBase(object):
def method_x(self, a, b, c):
# really, %r is the way to go.
print CommonBase.method_x(%r, %r, %r, %r) % (self, a, b, c)

def __repr__(self):
# Just so we have a more compact repr
return '%s.%s' % (self.__class__.__name__, id(self))

class Mixin(CommonBase):
def method_x(self, a, b, c):
super(Mixin, self).method_x(a, b, c)
print Mixin,

class MyMixin(CommonBase):
def method_x(self, a, b, c):
super(MyMixin, self).method_x(a, b, c)
print MyMixin,

class BaseA(CommonBase):
def method_x(self, a, b, c):
super(BaseA, self).method_x(a, b, c)
print BaseA,

class BaseB(CommonBase):
def method_x(self, a, b, c):
super(BaseB, self).method_x(a, b, c)
print BaseB,

class BaseC(CommonBase):
pass

class FooV(Mixin, BaseA):
def method_x(self, a, b, c):
super(FooV, self).method_x(a, b, c)
print FooV,

class FooW(Mixin, MyMixin, BaseA):
def method_x(self, a, b, c):
super(FooW, self).method_x(a, b, c)
print FooW,

class FooX(MyMixin, BaseA):
def method_x(self, a, b, c):
super(FooX, self).method_x(a, b, c)
print FooX,

class FooY(MyMixin, BaseB):
pass

class FooZ(MyMixin, BaseC):
def method_x(self, a, b, c):
super(FooZ, self).method_x(a, b, c)
print FooZ,

class BarW(Mixin, BaseA, MyMixin):
def method_x(self, a, b, c):
super(BarW, self).method_x(a, b, c)
print BarW,

class BarX(BaseA, MyMixin):
def method_x(self, a, b, c):
super(BarX, self).method_x(a, b, c)
print BarX,

class BarY(BaseB, MyMixin):
def method_x(self, a, b, c):
super(BarY, self).method_x(a, b, c)
print BarY,

class BarZ(BaseB, Mixin):
def method_x(self, a, b, c):
super(BarZ, self).method_x(a, b, c)
print BarZ,


 main() # prints
CommonBase.method_x(BaseA.18591280, 0, '', '0x0')
BaseA
...
CommonBase.method_x(FooZ.18478384, 7, '---', '0x5b')
MyMixin FooZ
CommonBase.method_x(BarW.18480592, 8, '', '0x68')
MyMixin BaseA Mixin BarW
...


If you make of Mixin and MyMixin inherit from object you get:

CommonBase.method_x(BaseA.18613328, 0, '', '0x0')
BaseA
...
CommonBase.method_x(FooZ.18480592, 7, '---', '0x5b')
MyMixin FooZ
CommonBase.method_x(BarW.18591280, 8, '', '0x68')
BaseA Mixin BarW
...

Note that in the BarW case (with object), not all mixins are called.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: [Tkinter] messed callbacks

2009-09-09 Thread Scott David Daniels

Giacomo Boffi wrote:

Giacomo Boffi giacomo.bo...@polimi.it writes:

...

| def create_cb(a,b):
| return lambda: output(a+'-'+b)
| 
| def doit(fr,lst):

|   for c1,c2 in zip(lst[::2], lst[1::2]):
| subframe=Frame(fr)
| Label(subframe,text=c1+' - '+c2).pack(side='left',expand=1,fill='both')
| Button(subframe,text='',command=create_cb(c1,c2)).pack()
| Button(subframe,text='',command=create_cb(c2,c1)).pack()
| subframe.pack(fill='x',expand=1)

...

works ok, now i have to fully understand my previous error



This is really why functools.partial exists.  Now that you know what was
going wrong, you can understand its value.  You can accomplish the same
thing as above with:
from functools import partial
...
def doit(fr,lst):
for c1, c2 in zip(lst[::2], lst[1::2]):
subframe = Frame(fr)
Label(subframe, text=c1 + ' - ' + c2
 ).pack(side='left', expand=1, fill='both')
Button(subframe, text='',
   command=partial(output, c1 + '-' + c2)).pack()
Button(subframe, text='',
   command=partial(output, c2 + '-' + c1)).pack()
subframe.pack(fill='x', expand=1)
...
Also note from Pep 8, spaces are cheap and make the code easier to read.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: The future of Python immutability

2009-09-04 Thread Scott David Daniels

John Nagle wrote:

... Suppose, for discussion purposes, we had general immutable objects.
Objects inherited from immutableobject instead of object would be
unchangeable once __init__ had returned.  Where does this take us?


Traditionally in Python we make that, once __new__ had returned.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: possible attribute-oriented class

2009-09-04 Thread Scott David Daniels

Ken Newton wrote: ...

I would appreciate comments on this code. First, is something like
this already done? Second, are there reasons for not doing this?  ...

class AttrClass(object):

  ...

def __repr__(self):
return %s(%s) % (self.__class__.__name__, self.__dict__.__repr__())
def __str__(self):
ll = ['{']
for k,v in self.__dict__.iteritems():
ll.append(%s : %s % (k, str(v)))
return '\n'.join(ll) + '}'


Yes, I've done stuff something like this (I use setattr /
getattr rather than direct access to the __dict__).

You'd do better to sort the keys before outputting them, so
that you don't confuse the user by printing two similarly
built parts in different orders.

Personally, I'd filter the outputs to avoid names beginning
with '_', as they may contribute to clutter without adding
much information.

An equality operator would be nice as well (don't bother with
ordering though, you get lost in a twisty maze of definitions
all different).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: using queue

2009-09-02 Thread Scott David Daniels

Tim Arnold wrote:
MRAB pyt...@mrabarnett.plus.com wrote in message 
news:mailman.835.1251886213.2854.python-l...@python.org...

I don't need that many threads; just create a few to do the work and let
each do multiple chapters, something like this:


a very pretty implementation with worker code:

while True:
chapter = self.chapter_queue.get()
if chapter is None:
# A None indicates that there are no more chapters.
break
chapter.compile()
# Put back the None so that the next thread will also see it.
self.chapter_queue.put(None)


and loading like:

for c in self.document.chapter_objects:
chapter_queue.put(some work)
chapter_queue.put(None)
...
# The threads will finish when they see the None in the queue.
for t in thread_list:
t.join()


hi, thanks for that code. It took me a bit to understand what's going on, 
but I think I see it now.

Still, I have two questions about it:
(1) what's wrong with having each chapter in a separate thread? Too much 
going on for a single processor? 

Many more threads than cores and you spend a lot of your CPU switching
tasks.

(2) The None at the end of the queue...I thought t.join() would just work. 
Why do we need None?


Because your workers aren't finished, they are running trying to get
something more to do out of the queue.  The t.join() would cause a
deadlock w/o the None.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Putting together a larger matrix from smaller matrices

2009-08-25 Thread Scott David Daniels

Matjaz Bezovnik wrote:

If you are using numpy (which it sounds like you are):

IDLE 2.6.2
 import numpy as np
 v = np.array([[0,1,2],[3,4,5],[6,7,8]], dtype=float)
 v
array([[ 0.,  1.,  2.],
   [ 3.,  4.,  5.],
   [ 6.,  7.,  8.]])
 w = np.array([[10,11,12],[13,14,15],[16,17,18]], dtype=float)
 w
array([[ 10.,  11.,  12.],
   [ 13.,  14.,  15.],
   [ 16.,  17.,  18.]])
 r = np.zeros((6,6))
 r
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.]])
 r[:3,:3] = v
 r
array([[ 0.,  1.,  2.,  0.,  0.,  0.],
   [ 3.,  4.,  5.,  0.,  0.,  0.],
   [ 6.,  7.,  8.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.]])
 r[3:,3:] = w
 r
array([[  0.,   1.,   2.,   0.,   0.,   0.],
   [  3.,   4.,   5.,   0.,   0.,   0.],
   [  6.,   7.,   8.,   0.,   0.,   0.],
   [  0.,   0.,   0.,  10.,  11.,  12.],
   [  0.,   0.,   0.,  13.,  14.,  15.],
   [  0.,   0.,   0.,  16.,  17.,  18.]])


In general, make the right-sized array of zeros, and at various points:
and you can ssign to subranges of the result array:

N = 3
result = np.zeros((len(parts) * N, len(parts) * N), dtype=float)
for n, chunk in enumerate(parts):
base = n * 3
result[base : base + 3, base : base + 3] = chunk

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numeric literals in other than base 10 - was Annoying octal notation

2009-08-24 Thread Scott David Daniels

Piet van Oostrum wrote:

Scott David Daniels scott.dani...@acm.org (SDD) wrote:



SDD James Harris wrote:...

Another option:

0.(2:1011), 0.(8:7621), 0.(16:c26b)

where the three characters 0.( begin the sequence.

Comments? Improvements?



SDD I did a little interpreter where non-base 10 numbers
SDD (up to base 36) were:



SDD .7.100   == 64  (octal)
SDD .9.100   == 100 (decimal)
SDD .F.100   == 256 (hexadecimal)
SDD .1.100   == 4   (binary)
SDD .3.100   == 9   (trinary)
SDD .Z.100   == 46656 (base 36)


I wonder how you wrote that interpreter, given that some answers are wrong.
Obviously I started with a different set of examples and edited after 
starting to make a table that could be interpretted in each base.  After

doing that, I forgot to double check, and lo and behold .F.1000 = 46656,
while .F.100 = 1296.  Since it has been decades since I've had access
to that interpreter, this is all from memory.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: generate keyboard/mouse event under windows

2009-08-23 Thread Scott David Daniels

Ray wrote:

On Aug 19, 2:07 pm, yaka gu.yakahug...@gmail.com wrote:

Read this and see if it helps:

http://kvance.livejournal.com/985732.html


is there a way to generate a 'true' keyboard event? (works like user
pressed a key on keyboard)
not send the 'send keyboard event to application' ?


If there is such a spot, it is a major security weakness.
You'd be able to automate password attacks.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numeric literals in other than base 10 - was Annoying octal notation

2009-08-23 Thread Scott David Daniels

James Harris wrote:...

Another option:

  0.(2:1011), 0.(8:7621), 0.(16:c26b)

where the three characters 0.( begin the sequence.

Comments? Improvements?


I did a little interpreter where non-base 10 numbers
(up to base 36) were:

.7.100   == 64  (octal)
.9.100   == 100 (decimal)
.F.100   == 256 (hexadecimal)
.1.100   == 4   (binary)
.3.100   == 9   (trinary)
.Z.100   == 46656 (base 36)
Advantages:
Tokenizer can recognize chunks easily.
Not visually too confusing,
No issue of what base the base indicator is expressed in.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: debugger

2009-08-22 Thread Scott David Daniels

flagmino wrote:

To get familiar with the debugger, I have loaded this program:

import math

def s1(x, y):
   a = (x + y)
   print(Answer from s1), a
   return

def s2(x, y):
   b = (x - y)
   print(This comes from s2), b
   #print z
   print(call from s2: ), s1(x, y)
   return

I am trying to debug:
I press shift-F9 and F7. I end up in the interpreter where I enter s2
(1, 2).

From that point if I press F7, the program restart all over.
If I press Enter, the program gets out of debug mode.

Please help me figuring out how I can use the dbugger. You are welcome
to send a sound file if this is easier for you.

Thanks

ray

You need to tell us:
Which Python version (e.g. 2.6.2)
Which platform (hardware  OS) (e.g. 64-bit AMD FreeBSD)
Which debugger (e.g. Idle)
What you expected to happen that did not, and why you expected it.
or What happened and why you did not expect it.

Often you can lots of this information by going to your debugger window 
and doing Help // About, and go to your Python environment and type:

import sys
print sys.version # cut the results and paste in your message as
sys.version says, '2.6.2 (r262:71605, ...'  [don't do dots yourself]

To understand more of why we need this on every question, see:
http://www.mikeash.com/getting_answers.html
or google for smart questions.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: flatten a list of list

2009-08-16 Thread Scott David Daniels

Steven D'Aprano wrote:

On Sun, 16 Aug 2009 02:47:42 -0700, Terry wrote:

Is there a simple way (the pythonic way) to flatten a list of list?

Chris' suggestion using itertools seems pretty good:


from timeit import Timer
setup = \\

... L = [ [None]*5000 for _ in xrange(%d) ]
... from itertools import chain
... 

Timer(list(chain.from_iterable(L)), setup % 4).repeat(number=1000)

[0.61839914321899414, 0.61799716949462891, 0.62065696716308594]

Timer(list(chain.from_iterable(L)), setup % 8).repeat(number=1000)

[1.2618398666381836, 1.3385050296783447, 3.9113419055938721]

Timer(list(chain.from_iterable(L)), setup % 16).repeat(number=1000)

[3.1349358558654785, 4.8554730415344238, 5.431217987061]


OK, it definitely helps to get a size estimate before building:

 setup = \\
L = [ [None]*5000 for _ in xrange(%d) ]
import itertools

class Holder(object):
def __init__(self, list_of_lists):
self._list = list_of_lists
def __iter__(self):
return itertools.chain.from_iterable(self._list)
def __len__(self):
return sum(len(x) for x in self._list)


 timeit.Timer(list(Holder(L)), setup % 4).repeat(number=1000)
[0.59912279353940789, 0.59505886921382967, 0.59474989139681611]
 timeit.Timer(list(Holder(L)), setup % 8).repeat(number=1000)
[1.1898235669617208, 1.194797383466323, 1.1945367358141823]
 timeit.Timer(list(Holder(L)), setup % 16).repeat(number=1000)
[2.4244464031043123, 2.4261885239604482, 2.4050011942858589]

vs straight chain.from_iterable (on my machine):

[0.7828263089303249, 0.79326171343005925, 0.80967664884783019]
[1.499510971366476, 1.5263249938190455, 1.5599706107899181]
[3.4427520816193109, 3.632409426337702, 3.5290488036887382]

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: callable virtual method

2009-08-15 Thread Scott David Daniels

Jean-Michel Pichavant wrote:

Steven D'Aprano wrote:

On Fri, 14 Aug 2009 18:49:26 +0200, Jean-Michel Pichavant wrote:

 

Sorry guys (means guys *and* gals :op ), I realized I've not been able
to describe precisely what I want to do. I'd like the base class to be
virtual (aka abstract). However it may be abstract but it does not mean
it cannot do some usefull stuff.


Here is the schema of my abstract methods :

class Interface(object):
def method(self):
# -
# some common stuff executed here
# -
print 'hello world'
# -
# here shall stand child specific stuff (empty in the interface
method)
# -
if self.__class__.method == Interface.method:
raise NotImplementedError('You should have read the f**
manual ! You must override this method.')




Okay, so I want to sub-class your Interface class. As you said, the 
methods in the abstract class are still useful, so in my class, I 
don't need any extra functionality for some methods -- I'm happy with 
just the common stuff. So I use normal OO techniques and over-ride 
just the methods I need to over-ride:


  
Sometimes the base is doing cool stuff but incomplete stuff which 
requires knowledge only hold by the sub class. In my case the interface 
is a high level interface for a software that can run on multiple 
hardware platforms. Only the sub class has knowledge on how to operate 
the hardware, but no matter the hardware it still produces the same effect.


Let's say I have 50 different hardwares, I'll have 50 sub classes of 
Interface with the 'start' method to define. It wouldn't be appropriate 
(OO programming)to write 50 times '_log.debug('Starting %s' % self)' in 
each child start method when the simple task of logging the call can be 
nicely handled by the base class.


In the meantime, I must make sure  the user, who is not a python guru in 
this case, has implemented the start method for his hardware, because 
only him knows how to effectively start this hardware. I don't want him 
to come to me saying, I got no error, still my hardware does not 
start. You can then blame him for not reading the docs, but it will 
still be less expensive to throw a nice exception with an accurate 
feedback.


[snip]

class VerboseGoodChild(Interface):
# forced to over-ride methods for no good reason
  


Definitely no !! This is the purpose of an interface class: to force 
people to write these methods. They *are* required, if they were not, 
they would not belong to the Interface.


JM


But there _is_ one moment when you can check those things, then avoid
checking thereafter: object creation.  So you can complicate your
__init__ (or __new__) with those checks that make sure you instantiate
only fully defined subclasses:

# obviously not tested except in concept:

class Base(object_or_whatever):
 def __init__(self, ...):
 class_ = self.__class__
 if class_ is Base:
 raise TypeError('Attempt to instantiate Base class')
 for name in 'one two three four':
 if getattr(Base, name) is not getattr(Base, name):
 raise NotImplementedError(
 '%s implementation missing' % name)
 ...

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: unittest

2009-08-15 Thread Scott David Daniels

Mag Gam wrote:

I am writing an application which has many command line arguments.
For example: foo.py -args bar bee

I would like to create a test suit using unittest so when I add
features to foo.py I don't want to break other things. I just heard
about unittest and would love to use it for this type of thing.

so my question is, when I do these tests do I have to code them into
foo.py? I prefer having a footest.py which will run the regression
tests. Any thoughts about this?

TIA

I avoid putting the tests in foo.py, simply because the bulk of my
tests would make the code harder to read.  So, no, unittest does not
require that you code things into foo.py.  You will find that you
may bend your coding style within foo.py in order to make it more
testable, but (if you do it right) that should also make the code
clearer.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Format Code Repeat Counts?

2009-08-14 Thread Scott David Daniels

MRAB wrote:

Scott David Daniels wrote:

MRAB wrote:

The shortest I can come up with is:
[ + ][.join(letters) + ]


Maybe a golf shot:
  ][.join(letters).join([])


Even shorter:

[+][.join(letters)+]

:-)

I was going by PEP8 rules. ;-)

--Scott David Daniels
Scott David dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Programming by Contract

2009-08-13 Thread Scott David Daniels

Charles Yeomans wrote:


On Aug 11, 2009, at 3:30 PM, Ethan Furman wrote:


Ethan Furman wrote:

Greetings!
I have seen posts about the assert statement and PbC (or maybe it was 
DbC), and I just took a very brief look at pycontract 
(http://www.wayforward.net/pycontract/) and now I have at least one 
question:  Is this basically another way of thinking about unit 
testing, or is the idea of PbC more along the lines of *always* 
checking the input/output of functions to ensure they are correct?  
(*Contstant vigilance!* as Prof Moody would say ;)
I know asserts can be turned off, so they obviously won't work for 
the latter case, and having seen the sample of pycontract it seems it 
only does its thing during debugging.
So is Design (Programming) by Contract a fancy way of saying 
Document your inputs/outputs! or is there more to it?

~Ethan~


Hmmm...

Well, from the (apparently) complete lack of interest, I shall take 
away the (better?) documentation ideas and unit testing ideas, and not 
worry about the rest.  :)








Design by contract is complementary to unit testing (I notice that the 
author of PEP 316 appears confused about this).  DbC is, roughly 
speaking, about explicit allocation of responsibility.  Consider this 
contrived example.


def foo(s):
require(s is not None)
//code
ensure(hasattr(returnValue, '__iter__'))


yo might want two flags, REQUIRE_OFF, and ENSURE_ON that control
testing, and change the code above to:
  require(REQUIRE_OFF or s is not None)
  //code
  ensure(ENSURE_OFF or hasattr(returnValue, '__iter__'))

Python has no good way to turn off argument calculation by
manipulating function definition (at least that I know of).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Format Code Repeat Counts?

2009-08-13 Thread Scott David Daniels

MRAB wrote:

The shortest I can come up with is:
[ + ][.join(letters) + ]


Maybe a golf shot:
  ][.join(letters).join([])


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: i Don't get why it makes trouble

2009-08-13 Thread Scott David Daniels

azrael wrote:

... A lot of people a not aware of SQL injection. My friend from college
asked me and a couple of other guys for Pen testing of an website. His
SQL injection mistake made him an epic fail.


And some people are unaware of the unofficial official Python citation
for SQL injection explanations:
http://xkcd.com/327/

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: better way?

2009-08-12 Thread Scott David Daniels

Pet wrote:

On 11 Aug., 22:19, Rami Chowdhury rami.chowdh...@gmail.com wrote:
Ah, my apologies, I must have been getting it confused with ON UPDATE  
[things]. Thanks for correcting me.


On Tue, 11 Aug 2009 13:10:03 -0700, Matthew Woodcraft  


matt...@woodcraft.me.uk wrote:

Rami Chowdhury rami.chowdh...@gmail.com writes:

IIRC Postgres has had ON DUPLICATE KEY UPDATE functionality longer than
MySQL...

PostgreSQL does not have ON DUPLICATE KEY UPDATE.
The SQL standard way to do what the OP wants is MERGE. PostgreSQL
doesn't have that either.


So, I'm doing it in right way?
What about building columns? map(lambda s: s + ' = %s', fields)
Is that o.k.?

Isn't
t = [field + ' = %s' for field in fields]
clearer than
t = map(lambda s: s + ' = %s', fields)
? your call of course.

I don't quite understand why you are building the SQL from data
but constructing the arguments in source.  I'd actually set the
SQL up directly as a string, making both the SQL and Python more
readable. To the original question, you could unconditionally
perform a queries vaguely like:

UPDATE_SQL = '''UPDATE table ...
 WHERE id = %s AND location = %s;'''
INSERT_SQL = '''INSERT INTO table(...
 WHERE NOT EXISTS(SELECT * FROM table
  WHERE id = %s AND location = %s;);'''
I'd put the NOW() and constant args (like the 1) in the SQL itself.
then your code might become:
row = (self.wl, name, location, id)
self._execQuery(db, UPDATE_SQL, [row])
self._execQuery(db, INSERT_SQL, [row + (location, id)])
if _execQuery is like the standard Python DB interfaces.  Having
the SQL do the checking means you allows the DB to check its
index and use that result to control the operation, simplifying
the Python code without significantly affecting the the DB work
needed.  The SELECT * form in the EXIST test is something DB
optimizers look for, so don't fret about wasted data movement.



--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unrecognized escape sequences in string literals

2009-08-10 Thread Scott David Daniels

Douglas Alan wrote:

So, what's the one obvious right way to express foo\zbar? Is it
   foo\zbar
or
   foo\\zbar
And if it's the latter, what possible benefit is there in allowing the
former?  And if it's the former, why does Python echo the latter?


Actually, if we were designing from fresh (with no C behind us), I might
advocate for \s to be the escape sequence for a backslash.  I don't
particularly like that it is hard to see if the following string
contains a tab:   abc\table.  The string rules reflect C's
rules, and I see little excuse for trying to change them now.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why all the __double_underscored_vars__?

2009-08-08 Thread Scott David Daniels

kj wrote:

... I find it quite difficult to explain to my
students (who are complete newcomers to programming) all the
__underscored__ stuff that even rank noobs like them have to deal
with. =C2=A0(Trust me, to most of them your reply to my post would be
as clear as mud.)

Believe me, it's not me who's bringing this stuff up: *they*
specifically ask.  That's precisely my point: it is *they* who
somehow feel they can't avoid finding out about this stuff; they
must run into such __arcana__ often enough to cause them to wonder.
If at least some rank beginners (i.e. some of my students) feel
this way, I suggest that some of this alleged __arcana__ should be
demoted to a more mundane everyday status, without the scare-underscores.
E.g. maybe there should be a built-in is_main(), or some such, so
that beginners don't have to venture into the dark underworld of
__name__ and __main__.


Do you know about Kirby Urner's technique of calling such symbols,
ribs, -- the access to the stuff Python is built from?  One nice
thing about Python is that you can experiment with what these
__ribs__ do without having to learn yet another language.

It seems nice to me that you can use a rule that says, stick to
normal names and you don't have to worry about mucking with the
way Python itself works, but if you are curious, looks for those
things and fiddle with them.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Problem in installing PyGreSQL

2009-08-07 Thread Scott David Daniels

Dennis Lee Bieber wrote:

On Thu, 6 Aug 2009 16:00:15 +0530, Thangappan.M
thangappan...@gmail.com declaimed the following in
gmane.comp.python.general:

  File ./setup.py, line 219, in finalize_options
except (Warning, w):
NameError: global name 'w' is not defined

What would be the solution?
Otherwise can you tell how to install DB-API in debian machine.

Sorry... 1) I run on WinXP; 2) I don't build packages, relying on
pre-built binaries; 3) I run MySQL.

However, based upon the examples in the Tutorial, that line should
not have the (, ). A parenthesised (tuple) is suppose to contain a list
of exceptions, and the parameter to catch the exception specifics has to
be outside the list.

Best I can suggest is editing that particular line and removing the
(, ) -- then try rebuilding.

I'll also re-ask: All you are installing is the Python adapter to
the database. DO YOU HAVE A RUNNING PostgreSQL server that you can
connect to?


Just to be a bit more explict:
Change file setup.py's line 219 from:
 except (Warning, w):
to either (OK in Python 2.6 and greater):
   except Warning as w:
or (works for Python 2.X):
   except Warning, w:


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: M2Crypto: How to generate subjectKeyIdentifier / authorityKeyIdentifier

2009-08-07 Thread Scott David Daniels

Matthias Güntert wrote:

M2Crypto has a couple of bugs open related that, with potential
workarounds that I haven't yet deemed polished enough to checkin, but
which might help you out:

https://bugzilla.osafoundation.org/show_bug.cgi?id=7530
https://bugzilla.osafoundation.org/show_bug.cgi?id=12151


... Generating the 'subjectKeyIdentifier':

 ...

def get_public_key_fingerprint(self):
h = hashlib.new('sha1')
h.update(self.keypair.as_der())
client_serial = h.hexdigest().upper()
client_serial_hex = ''
for byte in xrange(20):
 client_serial_hex += client_serial[byte*2] + client_serial[byte*2
+1]
if byte  19:
client_serial_hex += ':'
return client_serial_hex 
...


More tersely (code golf?):

def get_public_key_fingerprint(self):
digest = hashlib.sha1(self.keypair.as_der()).hexdigest().upper()
return ':'.join(digest[pos : pos+2] for pos in range(0, 40, 2))

--Scott David Daniels
scott.dani...@acm.org


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to overload operator (a x b)?

2009-08-07 Thread Scott David Daniels

Benjamin Kaplan wrote:

 Python does not support compound
comparisons like that. You have to do a  b and b  c.


Funny, my python does.  This has been around a long time.
I am not certain whether 1.5.2 did it, but chained comparisons
have been around for a long time.

 'a' 'd' 'z'
True
 'a' 'D' 'z'
False

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Bug or feature: double strings as one

2009-08-07 Thread Scott David Daniels

Grant Edwards wrote:

On 2009-08-07, durumdara durumd...@gmail.com wrote:

In other languages, like Delphi (Pascal), Javascript, SQL, etc., I
must concatenate the strings with some sign, like + or ||.


In other languages like Ruby, awk, C, C++, etc. adjacent string
constants are concatenated.


I must learn this etc. language, I hear it mentioned all the time :-)

--Scott David Daniels
scott.dani...@acm.org

--
http://mail.python.org/mailman/listinfo/python-list


Re: Overlap in python

2009-08-05 Thread Scott David Daniels

Jay Bird wrote:

Hi everyone,

I've been trying to figure out a simple algorithm on how to combine a
list of parts that have 1D locations that overlap into a non-
overlapping list.  For example, here would be my input:

part name   location
a  5-9
b  7-10
c  3-6
d  15-20
e  18-23

And here is what I need for an output:
part name   location
c.a.b3-10
d.e   15-23

I've tried various methods, which all fail.  Does anyone have an idea
how to do this?

Thank you very much!
Jay


I once had to do this for finding nested block structure.
The key for me was a sort order:  start, -final.

Having not seen it here (though I looked a bit), here's one:

class Entry(object):
'''An entry is a name and range'''
def __init__(self, line):
self.name, startstop = line.split()
start, stop = startstop.split('-')
self.start, self.stop = int(start), int(stop)


def combined_ranges(lines):
'''Create Entries in magic order, and produce ranges.

The magic order makes least element with longest range first, so
overlaps show up in head order, with final tail first among equals.
'''
# Fill in our table (ignoring blank lines), then sort by magic order
elements = [Entry(line) for line in lines if line.strip()]
elements.sort(key=lambda e: (e.start, -e.stop))

# Now produce resolved ranges.  Grab the start
gen = iter(elements)
first = gen.next()

# For the remainder, combine or produce
for v in gen:
if v.start = first.stop:
# on overlap, merge in new element (may update stop)
first.name += '.' + v.name
if first.stop  v.stop:
first.stop = v.stop
else:
yield first
first = v
# And now produce the last element we covered
yield first

# Demo:
sample = '''part name   location
a  5-9
b  7-10
c  3-6
d  15-20
e  18-23
'''
source = iter(sample.split('\n')) # source of lines, opened file?
ignored = source.next() # discard heading
for interval in combined_range(source):
print '%s  %s-%s' % (interval.name, interval.start, interval.stop)

Prints:
c.a.b  3-10
d.e  15-23


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: trouble with complex numbers

2009-08-05 Thread Scott David Daniels

alex23 wrote:

Piet van Oostrum p...@cs.uu.nl wrote:

That should be z += 0j


Pardon my ignorance, but could anyone explain the rationale behind
using 'j' to indicate the imaginary number (as opposed to the more
intuitive 'i')?

(Not that I've had much call to use complex numbers but I'm
curious)

I think it explained in the complex math area, but basically EE types
use j, math types use i for exactly the same thing.  Since i is so
frequently and index in CS, and there is another strong convention,
why not let the EE types win?

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Predefined Variables

2009-08-02 Thread Scott David Daniels

Piet van Oostrum wrote:

Scott David Daniels scott.dani...@acm.org (SDD) wrote:

SDD Stephen Cuppett (should have written in this order):

Fred Atkinson fatkin...@mishmash.com wrote ...

Is there a pre-defined variable that returns the GET line...

os.environment('QUERY_STRING')

SDD Maybe you mean:
SDD os.environ['USER']

Let's take the best of both:
os.environ['QUERY_STRING']


Sorry about that.  I was testing expression before posting, and I don't
do that much cgi stuff.  I forgot to restore the variable name.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Seeding the rand() Generator

2009-08-02 Thread Scott David Daniels

Fred Atkinson wrote:

How does one seed the rand() generator when retrieving random
recordings in MySQL?  


It is not entirely clear what you are asking.  If you are talking about
MySQL's random number generator, you are talking in the wrong newsgroup.
If you are talking about Python's, does this work?
import random
random.seed(123542552)
I'm not quite sure how you came to believe that Python controls MySQL,
as opposed to using its services.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: fast video encoding

2009-07-31 Thread Scott David Daniels

gregorth wrote:

for a scientific application I need to save a video stream to disc for
further post processing. My cam can deliver 8bit grayscale images with
resolution 640x480 with a framerate up to 100Hz, this is a data rate
of 30MB/s. Writing the data uncompressed to disc hits the data
transfer limits of my current system and creates huge files. Therefore
I would like to use video compression, preferably fast and high
quality to lossless encoding. Final file size is not that important.

Well, it sounds like it better be enough to affect bandwidth.


I am a novice with video encoding. I found that few codecs support
gray scale images. Any hints to take advantage of the fact that I only
have gray scale images?


You might try to see if there is a primitive .MNG encoder around.
That could give you lossless with perhaps enough compression to make
you happy, and I'm sure it will handle the grayscale.

.MNG is pictures only, but that doesn't hurt you in the least.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: regex: multiple matching for one string

2009-07-24 Thread Scott David Daniels

ru...@yahoo.com wrote:

Nick Dumas wrote:

On 7/23/2009 9:23 AM, Mark Lawrence wrote:

scriptlear...@gmail.com wrote:

For example, I have a string #a=valuea;b=valueb;c=valuec;, and I
will like to take out the values (valuea, valueb, and valuec).  How do
I do that in Python?  The group method will only return the matched
part.  Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
 print m.group(),

IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.


You're saying that something like the following
is better than the simple regex used by the OP?
[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

I call straw man: [tested]
line = #a=valuea;b=valueb;c=valuec;
d = dict(single.split('=', 1)
 for single in line.split(';') if single)
d['#a'], d['b'], d['c']
If you want checking code, add:
if len(d) != 3:
raise ValueError('Too many keys: %s in %r)' % (
 sorted(d), line))


Blech, not in my book.  The regex checks the
format of the string, extracts the values, and
does so very clearly.  Further, it is easily
adapted to other similar formats, or evolutionary
changes in format.  It is also (once one is
familiar with regexes -- a useful skill outside
of Python too) easier to get right (at least in
a simple case like this.)

The posted regex doesn't work; this might be homework, so
I'll not fix the two problems.  The fact that you did not
see the failure weakens your claim of does so very clearly.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Predefined Variables

2009-07-24 Thread Scott David Daniels

Stephen Cuppett (should have written in this order):

Fred Atkinson fatkin...@mishmash.com wrote ...

Is there a pre-defined variable that returns the GET line

(http://www.php.net/index.php?everythingafterthequestionmark) as a
single variable (rather than individual variables)?


 os.environment('QUERY_STRING')

Maybe you mean:
os.environ['USER']

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Mechanize not recognized by py2exe

2009-07-22 Thread Scott David Daniels

OrcaSoul wrote:

...it's too late to name my first born after you, but
I did have a cat named Gabriel - she was a great cat!


Remember Gabriel in English uses a hard A, as in the horn
player, not Gabrielle.  I know because when I first read
his posts I made the same trick in my head, and hence
imagined a woman.  I suspect it would come to irk one
almost enough to become a Gabe.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiple versions of python

2009-07-22 Thread Scott David Daniels

CCW wrote:

On 21 July, 15:19, Dave Angel da...@dejaviewphoto.com wrote:

The other thing you may want to do in a batch file is to change the file
associations so that you can run the .py file directly, without typing
python or pythonw in front of it.

The relevant Windows commands are: assoc and ftype  And on a
related note, you may want to edit the PATHEXT environment variable, to
add .PY and .PYW


Thanks for this - this way made a bit more sense to me.  I've now got
C:\commands with the 4 .bat files in, and C:\commands in my path.  It
all seems to work :) I think I've missed the point of the @ though -
it doesn't seem to make any difference..

I'm also a bit confused with the associations - since I've got python
2.6 and 3.1, surely the command I type (python26 or python31) is the
only way to force a script to be run using a specific interpreter at
runtime without having to change the .bat file every time I want to
run a script using 3.1 instead of 2.6?


OK, for me currently:

C:\ assoc .py
.py=Python.File

C:\ assoc .pyw
.pyw=Python.NoConFile

C:\ ftype Python.File
Python.File=C:\Python31\python.exe %1 %*

C:\ ftype Python.NoConFile
Python.NoConFile=C:\Python31\pythonw.exe %1 %*

C:\ ftype Python.File
Python.File=C:\Python31\python.exe %1 %*

Now imagine instead that you've added:

C:\ ftype Python31.File=C:\Python31\python.exe %1 %*
C:\ ftype Python31.NoConFile=C:\Python31\pythonw.exe %1 %*
C:\ ftype Python26.File=C:\Python26\python.exe %1 %*
C:\ ftype Python26.NoConFile=C:\Python26\pythonw.exe %1 %*

Then you can do the following:
C:\ assoc .py=Python26.File
C:\ fumble.py
C:\ assoc .py=Python31.File
C:\ fumble.py

That is the basic idea, but at the moment, I don't see a simple demo
working for me.  SO, if you want to pursue this, you can probably get it
to work.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Issue combining gzip and subprocess

2009-07-22 Thread Scott David Daniels

Piet van Oostrum wrote:

...
f = gzip.open(filename, 'w')
proc = subprocess.Popen(['ls','-la'], stdout=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if not line: break
f.write(line)
f.close()


Or even:
proc = subprocess.Popen(['ls','-la'], stdout=subprocess.PIPE)
with gzip.open(filename, 'w') as dest:
for line in iter(proc.stdout, ''):
f.write(line)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Mutable Strings - Any libraries that offer this?

2009-07-21 Thread Scott David Daniels

Steven D'Aprano wrote:

On Mon, 20 Jul 2009 21:08:22 +1000, Ben Finney wrote:

What is it you're trying to do that makes you search for a mutable
string type? It's likely that a better approach can be found.


When dealing with very large strings, it is wasteful to have to duplicate 
the entire string just to mutate a single character.


However, when dealing with very large strings, it's arguably better to 
use the rope data structure instead.


The general problem is that whether strings are mutable or not is an
early language design decision, and few languages provide both.
Mutable strings need lots of data copying to be safe passing args to
unknown functions; immutable strings need lots of copying for ncremental
changes.  The rope is a great idea for some cases.

I'd argue Python works better with immutable strings, because Python is
too slow at per-character operations to be running up and down strings
a character at a time, changing here and there.  So it becomes more
natural to deal with strings as chunks to pass around, and it is nice
not to have to copy the strings when doing that passing around.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiple versions of python

2009-07-21 Thread Scott David Daniels

ChrisW wrote:

Hi,

I have installed 2 versions of python on my Windows XP computer - I
originally had 3.0.1, but then found that the MySQL module only
supported 2.*, so I've now installed that.  I have found that if I
change the Windows Environment Variable path, then I can change the
version of python called when I type 'python' into a command line.
However, I'd like to be able to choose which version I use.  I know
that if I change C:\Python26\python.exe to
C:\Python26\python2.exe and C:\Python30\python.exe to C:
\Python26\python3.exe, then typing 'python2' or 'python3' will invoke
the correct interpreter.  However, is it safe just to rename the
executable files? Is there a more elegant way to achieve the same
task?

I wouldn't rename them.  You can, of course, copy them (so you have two
executables), or you can pick a somedir on your path (I made a directory
C:\cmds that I add to my path, but tastes vary).

C: copy con somedir\py25.cmd
C:\Python25\python\python.exe %*
^Z
C: copy con somedir\py31.cmd
C:\Python31\python\python.exe %*
^Z

I'd use the two-digit form, as that is where interface changes
happen; trying code with py24, py25, py26 can be convenient.
By the way, install Python 3.1 rather than 3.0; think of 3.0 as the
alpha of the 3.X branches (it will get no love at all now that 3.1
is out).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-17 Thread Scott David Daniels

akhil1988 wrote:
mis-ordered reply, bits shown below

Nobody-38 wrote:

On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:

...

In Python 3 you can't decode strings because they are Unicode strings
and it doesn't make sense to decode a Unicode string. You can only
decode encoded things which are byte strings. So you are mixing up byte
strings and Unicode strings.

... I read a byte string from sys.stdin which needs to converted to unicode
string for further processing.

In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
read and write Unicode strings, not byte strings.


I cannot just remove the decode statement and proceed?
This is it what it looks like:
for line in sys.stdin:
line = line.decode('utf-8').strip()
if line == 'page': #do something here

If I remove the decode statement, line == 'page' never gets true. 

Did you inadvertently remove the strip() as well?

... unintentionally I removed strip()
I get this error now:
 File ./temp.py, line 488, in module
main()
  File ./temp.py, line 475, in main
for line in sys.stdin:
  File /usr/local/lib/python3.1/codecs.py, line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data


(1) Do not top post.
(2) Try to fully understand the problem and proposed solution, rather
than trying to get people to tell you just enough to get your code
going.
(3) The only way sys.stdin can possibly return unicode is to do some
decoding of its own.  your job is to make sure it uses the correct
decoding.  So, if you know your source is always utf-8, try
something like:

import sys
import io

sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')

for line in sys.stdin:
line = line.strip()
if line == 'page':
#do something here


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient binary search tree stored in a flat array?

2009-07-14 Thread Scott David Daniels

Piet van Oostrum wrote:

Douglas Alan darkwate...@gmail.com (DA) wrote:



DA On Jul 13, 3:57 pm, a...@pythoncraft.com (Aahz) wrote:

Still, unless your list is large (more than thousands of elements),
that's the way you should go.  See the bisect module.  Thing is, the
speed difference between C and Python means the constant for insertion
and deletion is very very small relative to bytecode speed.  Keep in
mind that Python's object/binding model means that you're shuffling
pointers in the list rather than items.



DA Thank you. My question wasn't intended to be Python specific, though.
DA I am just curious for purely academic reasons about whether there is
DA such an algorithm. All the sources I've skimmed only seem to the
DA answer the question via omission. Which is kind of strange, since it
DA seems to me like an obvious question to ask.


It may well be that there is no good simple solution, and people avoid
writing about non-existent algorithms.  I certainly cannot imagine
trying to write an article that carefully covered ideas which don't
have well-studied data structures available, and calling them out
only to say, we don't know how to do this well.  If such an algorithm
were simple and obvious, I dare say you'd be taught about it around the
time you learn binary search.


Of course you can take any BST algorithm and replace pointers by indices
in the array and allocate new elements in the array. But then you need
array elements to contain the indices for the children explicitely.


And you loower your locality of reference (cache-friendliness).
Note the insert in Python, for example, is quite cache-friendly.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposal: Decimal literals in Python.

2009-07-14 Thread Scott David Daniels

Tim Roberts wrote:

My favorite notation for this comes from Ada, which allows arbitrary bases
from 2 to 16, and allows for underscores within numeric literals:

  x23_bin : constant :=  2#0001_0111#;
  x23_oct : constant :=  8#27#;
  x23_dec : constant := 10#23#;
  x23_hex : constant := 16#17#;

And mine is one w/o the base 10 bias:
.f.123 == 0x123
.7.123 == 0o123
.1.1101 == 0b1101
That is, .largest allowed digit.digits
-- show the base by showing base-1 in the base.
I actually built this into OZ, an interpretter.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to check if any item from a list of strings is in a big string?

2009-07-14 Thread Scott David Daniels

Nobody wrote:

On Tue, 14 Jul 2009 02:06:04 -0300, Gabriel Genellina wrote:


Matt, how many words are you looking for, in how long a string ?
Were you able to time any( substr in long_string ) against re.compile
( |.join( list_items )) ?
There is a known algorithm to solve specifically this problem  
(Aho-Corasick), a good implementation should perform better than R.E. (and  
better than the gen.expr. with the advantage of returning WHICH string  
matched)


Aho-Corasick has the advantage of being linear in the length of the
patterns, so the setup may be faster than re.compile(). The actual
searching won't necessarily be any faster (assuming optimal
implementations; I don't know how safe that assumption is).


Having done a fast Aho-Corasick implementation myself, I can assure you
that the actual searching can be incredibly fast.  RE conversion usually
goes to a slightly more general machine than the Aho-Corasick processing
requires.

--Scott David Daniels
scott.dani...@acm.org

--
http://mail.python.org/mailman/listinfo/python-list


Re: why did you choose the programming language(s)you currently use?

2009-07-14 Thread Scott David Daniels

Aahz wrote:

In article 4a5ccdd6$0$32679$9b4e6...@newsspool2.arcor-online.net,
Stefan Behnel  stefan...@behnel.de wrote:

Deep_Feelings wrote:

So you have chosen programming language x so shall you tell us why
you did so , and  what negatives or positives it has ?

*duck*


Where do you get the duck programming language?


It shares a type system with Python, of course.  :-)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: missing 'xor' Boolean operator

2009-07-14 Thread Scott David Daniels

Ethan Furman wrote:

  and returns the last object that is true

A little suspect this.
_and_ returns the first object that is not true, or the last object.

  or  returns the first object that is true

Similarly:
_or_ returns the first object that is true, or the last object.


so should xor return the only object that is true, else False/None?


Xor has the problem that in two cases it can return neither of its args.
Not has behavior similar in those cases, and we see it returns False or
True.  The Pythonic solution is therefore to use False.


def xor(a, b)
if a and b:
return None
elif a:
return a
elif b:
return b
else:
return None


def xor(a, b):
if bool(a) == bool(b):
return False
else:
return a or b

Side-effect counting in applications of bool(x) is ignored here.
If minimizing side-effects is needed:

def xor(a, b):
if a:
if not b:
return a
elif b:
return b
return False

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: PDF: finding a blank image

2009-07-13 Thread Scott David Daniels

DrLeif wrote:

I have about 6000 PDF files which have been produced using a scanner
with more being produced each day.  The PDF files contain old paper
records which have been taking up space.   The scanner is set to
detect when there is information on the backside of the page (duplex
scan).  The problem of course is it's not the always reliable and we
wind up with a number of PDF files containing blank pages.

What I would like to do is have python detect a blank pages in a PDF
file and remove it.  Any suggestions?


I'd check into ReportLab's commercial product, it may well be easily
capable of that.  If no success, you might contact PJ at Groklaw, she
has dealt with a _lot_ of PDFs (and knows people who deal with PDFs
in bulk).

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: tough-to-explain Python

2009-07-11 Thread Scott David Daniels

Steven D'Aprano wrote:
Even *soup stock* fits the same profile as what Hendrik claims is almost 
unique to programming. On its own, soup stock is totally useless. But you 
make it, now, so you can you feed it into something else later on.


Or instant coffee.


I think I'll avoid coming to your house for a cup of coffee. :-)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: tough-to-explain Python

2009-07-10 Thread Scott David Daniels

Steven D'Aprano wrote:

On Fri, 10 Jul 2009 08:28:29 -0700, Scott David Daniels wrote:

Steven D'Aprano wrote:

Even *soup stock* fits the same profile as what Hendrik claims is
almost unique to programming. On its own, soup stock is totally
useless. But you make it, now, so you can you feed it into something
else later on.
Or instant coffee.

I think I'll avoid coming to your house for a cup of coffee. :-)
I meant the instant coffee powder is prepared in advance. It's useless on 
it's own, but later on you feed it into boiling water, add sugar and 
milk, and it's slightly less useless.


I know, but the image of even a _great_ soup stock with instant
coffee poured in, both appalled me and made me giggle.  So, I
thought I'd share.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-10 Thread Scott David Daniels

Raymond Hettinger wrote:

[Scott David Daniels]

def most_frequent(arr, N): ...

In Py2.4 and later, see heapq.nlargest().

I should have remembered this one


In Py3.1, see collections.Counter(data).most_common(n)

This one is from Py3.2, I think.


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Sorry about that, the Counter class is there.

2009-07-10 Thread Scott David Daniels

Scott David Daniels wrote:

Raymond Hettinger wrote:

[Scott David Daniels]

def most_frequent(arr, N): ...

In Py2.4 and later, see heapq.nlargest().

I should have remembered this one


In Py3.1, see collections.Counter(data).most_common(n)

This one is from Py3.2, I think.


Oops -- egg all over my face.  I thought I was checking with 3.1, and it
was 2.6.2.  I _did_ make an explicit check, just poorly.

Again, apologies.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: hoe to build a patched socketmodule.c

2009-07-10 Thread Scott David Daniels

jacopo mondi wrote:

Roger Binns wrote:

jacopo mondi wrote:

Hi all, I need to patch socketmodule.c (the _socket module) in order to
add support to an experimental socket family.

You may find it considerably easier to use ctypes since that will avoid
the need for any patching.  You'll also be able to control how read and
write are done (eg read vs recvfrom vs recvmsg vs readv).  You can use
os.fdopen to convert your raw file descriptor into a Python file object
if appropriate.


The typical Python way of dealing with this is an additional module, not
a modified module placed back in the library.  So, take the sources and
edit, but change the module name.  Even better is figure out how to
use _socket.pyd, to create a smaller _socketexpmodule.c and use that.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: property using a classmethod

2009-07-09 Thread Scott David Daniels

Emanuele D'Arrigo wrote:

class MyClass(object):
@classmethod
def myClassMethod(self):
 print ham
 myProperty = property(myClassMethod, None, None)

... doesn't work and returns a TypeError:  So, how do I do this?
Ultimately all I want is a non-callable class-level attribute
MyClass.myProperty that gives the result of MyClass.myClassMethod().


properties affect instances, and classes are instances of types.
What you want is a new metaclass:

class MyType(type):
@property
def demo(class_):
return class_.a + 3

class MyClass(object):
__metaclass__ = MyType
a = 5

print MyClass.a, MyClass.demo

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


[issue6422] timeit called from within Python should allow autoranging

2009-07-08 Thread Scott David Daniels

Changes by Scott David Daniels scott.dani...@acm.org:


--
keywords: +patch
Added file: http://bugs.python.org/file14472/timeit.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6422
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: ISO library ref in printed form

2009-07-07 Thread Scott David Daniels

kj wrote:

Does anyone know where I can buy the Python library reference in
printed form?  (I'd rather not print the whole 1200+-page tome
myself.)  I'm interested in both/either 2.6 and 3.0.


Personally, I'd get the new Beazley's Python Essential Reference,
which is due out real soon now, and then use the provided docs
as a addon.  Also consider grabbing Gruet's Python Quick Reference
page.  When I was working in a printer site I printed the color
version of Gruet's page two-sided; it was neither too bulky nor
too sketchy for my needs (and he uses color to distinguish
version-to-version changes).
http://rgruet.free.fr/
Sadly, I no longer work there, so my copy is gone. :-(

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


[issue6422] timeit called from within Python should allow autoranging

2009-07-07 Thread Scott David Daniels

Scott David Daniels scott.dani...@acm.org added the comment:

I've got the code working on trunk2 for my tests.
Should I port to py3K before checking in, and give diffs from there, or 
what?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6422
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Clarity vs. code reuse/generality

2009-07-06 Thread Scott David Daniels

Andre Engels wrote:

On Mon, Jul 6, 2009 at 9:44 AM, Martin Vilcansmar...@librador.com wrote:

On Fri, Jul 3, 2009 at 4:05 PM, kjno.em...@please.post wrote:

I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse.  I'd love your opinion about
it.

In general, code clarity is more important than reusability.
Unfortunately, many novice programmers have the opposite impression. I
have seen too much convoluted code written by beginners who try to
make the code generic. Writing simple, clear, to-the-point code is
hard enough as it is, even when not aiming at making it reusable.

If in the future you see an opportunity to reuse the code, then and
only then is the time to make it generic.


Not just that, when you actually get to that point, making simple and
clear code generic is often easier than making
complicated-and-supposedly-generic code that little bit more generic
that you need.


First, a quote which took me a bit to find:
Thomas William Körner paraphrasing Polya and Svego
in A Companion to Analysis:
Recalling that 'once is a trick, twice is a method,
thrice is a theorem, and four times a theory,' we
seek to codify this insight.

Let us apply this insight:
Suppose in writing code, we pretty much go with that.
A method is something you notice, a theorem is a function, and
a theory is a generalized function.

Even though we like DRY (don't repeat yourself) as a maxim, let
it go the first time and wait until you see the pattern (a possible
function).  I'd go with a function first, a pair of functions, and
only then look to abstracting the function.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-06 Thread Scott David Daniels

Steven D'Aprano wrote:

... That's the Wrong Way to do it --
you're using a screwdriver to hammer a nail


Don't knock tool abuse (though I agree with you here).
Sometimes tool abuse can produce good results.  For
example, using hammers to drive screws for temporary
strong holds led to making better nails.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-06 Thread Scott David Daniels

Peter Otten wrote:

Scott David Daniels wrote:


Scott David Daniels wrote:



 t = timeit.Timer('sum(part[:-1]==part[1:])',
  'from __main__ import part')


What happens if you calculate the sum in numpy? Try

t = timeit.Timer('(part[:-1]==part[1:]).sum()',
 'from __main__ import part')


Good idea, I hadn't thought of adding numpy bools.
(part[:-1]==part[1:]).sum()
is only a slight improvement over
len(part[part[:-1]==part[1:]])
when there are few elements, but it is almost twice
as fast when there are a lot (reflecting the work
of allocating and copying).

 import numpy
 import timeit
 original = numpy.random.normal(0, 100, (1000, 1000)).astype(int)
 data = original.flatten()
 data.sort()
 t = timeit.Timer('sum(part[:-1]==part[1:])',
 'from __main__ import part')
 u = timeit.Timer('len(part[part[:-1]==part[1:]])',
 'from __main__ import part')
 v = timeit.Timer('(part[:-1]==part[1:]).sum()',
 'from __main__ import part')

 part = data[::100]
 (part[:-1]==part[1:]).sum()
9390
 t.repeat(3, 10)
[0.56368281443587875, 0.55615057220961717, 0.55465764503594528]
 u.repeat(3, 1000)
[0.89576580263690175, 0.89276374511291579, 0.8937328626963108]
 v.repeat(3, 1000)
[0.24798598704592223, 0.24715431709898894, 0.24498979618920202]

 part = original.flatten()[::100]
 (part[:-1]==part[1:]).sum()
27
 t.repeat(3, 10)
[0.57576898739921489, 0.56410158274297828, 0.56988248506445416]
 u.repeat(3, 1000)
[0.27312186325366383, 0.27315007913011868, 0.27214492344683094]
 v.repeat(3, 1000)
[0.28410342655297427, 0.28374053126867693, 0.28318990262732768]


Net result: go back to former definition of candidates (a number,
not the actual entries), but calculate that number as matches.sum(),
not len(part[matches]).

Now the latest version of this (compressed) code:
 ...
 sampled = data[::stride]
 matches = sampled[:-1] == sampled[1:]
 candidates = sum(matches) # count identified matches
 while candidates  N * 10: # 10 -- heuristic
 stride *= 2 # # heuristic increase
 sampled = data[::stride]
 matches = sampled[:-1] == sampled[1:]
 candidates = sum(matches)
 while candidates  N * 3: # heuristic slop for long runs
 stride //= 2 # heuristic decrease
 sampled = data[::stride]
 matches = sampled[:-1] == sampled[1:]
 candidates = sum(matches)
 former = None
 past = 0
 for value in sampled[matches]:
 ...
is:
  ...
  sampled = data[::stride]
  matches = sampled[:-1] == sampled[1:]
  candidates = matches.sum() # count identified matches
  while candidates  N * 10: # 10 -- heuristic
  stride *= 2 # # heuristic increase
  sampled = data[::stride]
  matches = sampled[:-1] == sampled[1:]
  candidates = matches.sum()
  while candidates  N * 3: # heuristic slop for long runs
  stride //= 2 # heuristic decrease
  sampled = data[::stride]
  matches = sampled[:-1] == sampled[1:]
  candidates = matches.sum()
  former = None
  past = 0
  for value in sampled[matches]:
  ...

Now I think I can let this problem go, esp. since it was
mclovin's problem in the first place.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why is my code faster with append() in a loop than with a large list?

2009-07-06 Thread Scott David Daniels

Piet van Oostrum wrote:

Dave Angel da...@dejaviewphoto.com (DA) wrote:



DA It would probably save some time to not bother storing the zeroes in the
DA list at all.  And it should help if you were to step through a list of
DA primes, rather than trying every possible int.  Or at least constrain
DA yourself to odd numbers (after the initial case of 2).


...
# Based upon http://code.activestate.com/recipes/117119/

D = {9: 6} # contains composite numbers

XXX Dlist = [2, 3] # list of already generated primes
  Elist = [(2, 4), (3, 9)] # list of primes and their squares



XXX def sieve():
XXX   '''generator that yields all prime numbers'''
XXX   global D
XXX   global Dlist
 def sieve2():
 '''generator that yields all primes and their squares'''
 # No need for global declarations, we alter, not replace
XXX   for p in Dlist:
XXX   yield p
XXX   q = Dlist[-1]+2

  for pair in Elist:
  yield pair
  q = pair[0] + 2


while True:
if q in D:
p = D[q]
x = q + p
while x in D: x += p
D[x] = p
else:

XXX   Dlist.append(q)
XXX   yield q
XXX   D[q*q] = 2*q
  square = q * q
  pair = q, square
  Elist.append(pair)
  yield pair
  D[square] = 2 * q

q += 2

def factorise(num):
Returns a list of prime factor powers. For example:
factorise(6) will return
[2, 2] (the powers are returned one higher than the actual value)
as in, 2^1 * 3^1 = 6.
powers = []
power = 0

XXX   for factor in sieve():
  for factor, limit in sieve2():

power = 0
while num % factor == 0:
power += 1
num /= factor

XXX   if power  0:
  if power: # good enough here, and faster

# if you really want the factors then append((factor, power))
powers.append(power+1)

XXX   if num == 1:
XXX   break
XXX   return powers
  if num  limit:
  if num  1:
  # if you really want the factors then append((num, 1))
  powers.append(2)
  return powers

OK, that's a straightforward speedup, _but_:
 factorize(6) == [2, 2] == factorize(10) ==  factorize(15)
So I am not sure exactly what you are calculating.


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Cleaning up after failing to contructing objects

2009-07-06 Thread Scott David Daniels

brasse wrote:

I have been thinking about how write exception safe constructors in
Python. By exception safe I mean a constructor that does not leak
resources when an exception is raised within it. 

...
 As you can see this is less than straight forward. Is there some kind
 of best practice that I'm not aware of?

Not so tough.  Something like this tweaked version of your example:

class Foo(object):
def __init__(self, name, fail=False):
self.name = name
if not fail:
print '%s.__init__(%s)' % (type(self).__name__, name)
else:
print '%s.__init__(%s), FAIL' % (type(self).__name__, name)
raise ValueError('Asked to fail: %r' % fail)

def close(self):
print '%s.close(%s)' % (type(self).__name__, self.name)


class Bar(object):
def __init__(self):
unwind = []
try:
self.a = Foo('a')
unwind.append(a)
self.b = Foo('b', fail=True)
unwind.append(b)
...
except Exception, why:
while unwind):
unwind.pop().close()
raise

bar = Bar()

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Scott David Daniels

Scott David Daniels wrote:

... Here's a heuristic replacement for my previous frequency code:
I've tried to mark where you could fudge numbers if the run time
is at all close.


Boy, I cannot let go.  I did a bit of a test checking for cost to
calculated number of discovered samples, and found after:
import timeit
import numpy
original = numpy.random.random(0, 100, (1000, 1000)).astype(int)
data = original.flatten()
data.sort()
part = data[::100]
t = timeit.Timer('sum(part[:-1]==part[1:])',
 'from __main__ import part')
v = timeit.Timer('len(part[part[:-1]==part[1:]])',
 'from __main__ import part')

I got:
 t.repeat(3, 10)
[0.58319842326318394, 0.57617574300638807, 0.57831819407238072]
 v.repeat(3, 1000)
[0.93933027801040225, 0.93704535073584339, 0.94096260837613954]

So, len(part[mask]) is almost 50X faster!  I checked:
 sum(part[:-1]==part[1:])
9393
 len(part[part[:-1]==part[1:]])
9393

That's an awful lot of matches, so I with high selectivity:
data = original.flatten()  # no sorting, so runs missing
part = data[::100]

 t.repeat(3, 10)
[0.58641335700485797, 0.58458854407490435, 0.58872594142576418]
 v.repeat(3, 1000)
[0.27352554584422251, 0.27375686015921019, 0.27433291102624935]

about 200X faster

 len(part[part[:-1]==part[1:]])
39
 sum(part[:-1]==part[1:])
39

So my new version of this (compressed) code:

...
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches) # count identified matches
while candidates  N * 10: # 10 -- heuristic
stride *= 2 # # heuristic increase
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)
while candidates  N * 3: # heuristic slop for long runs
stride //= 2 # heuristic decrease
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)
former = None
past = 0
for value in sampled[matches]:
...

is:
  ...
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  while len(candidates)  N * 10: # 10 -- heuristic
  stride *= 2 # # heuristic increase
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  while len(candidates)  N * 3: # heuristic slop for long runs
  stride //= 2 # heuristic decrease
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  former = None
  past = 0
  for value in candidates:
  ...
This change is important, for we try several strides before
settling on a choice, meaning the optimization can be valuable.
This also means we could be pickier at choosing strides (try
more values), since checking is cheaper than before.

Summary: when dealing with numpy, (or any bulk - individual values
transitions), try several ways that you think are equivalent and
_measure_.  In the OODB work I did we called this impedance mismatch,
and it is likely some boundary transitions are _much_ faster than
others.  The sum case is one of them; I am getting numpy booleans
back, rather than numpy booleans, so conversions aren't going fastpath.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


[issue6422] timeit called from within Python should allow autoranging

2009-07-05 Thread Scott David Daniels

New submission from Scott David Daniels scott.dani...@acm.org:

timeit.main has a _very_ handy autoranging facility to pick an
appropriate number of repetitions when not specified.  The autoranging
code should be lifted to a method on Timer instances (so non-main code
can use it).  If number is specified as 0 or None, I would like to use
the results of that autoranging code in Timer.repeat and Timer.timeit.

Patch to follow.

--
components: Library (Lib)
messages: 90157
nosy: scott_daniels
severity: normal
status: open
title: timeit called from within Python should allow autoranging
type: feature request
versions: Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6422
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: question of style

2009-07-04 Thread Scott David Daniels

upwestdon wrote:

if not (self.higher and self.lower):
return self.higher or self.lower


self.lower = 0
self.higher = 123
???
More than just None is False
--
http://mail.python.org/mailman/listinfo/python-list


Re: Reversible Debugging

2009-07-04 Thread Scott David Daniels

Patrick Sabin wrote:

Horace Blegg schrieb:
You might consider using a VM with 'save-points'. You run the program 
(in a debugger/ida/what have you) to a certain point (logical point 
would be if/ifelse/else statements, etc) and save the VM state. Once 
you've saved, you continue. If you find the path you've taken isn't 
what you are after, you can reload a previous save point and start 
over, trying a different path the next time.
That was my idea to implement it. I thought of taking snapshots of the 
current state every time a unredoable instruction, e.g random number 
generation, is done. 

Remember, storing into a location is destruction.
Go over a list of VM instructions and see how many of them are undoable.
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels

Vilya Harvey wrote:

2009/7/4 Andre Engels andreeng...@gmail.com:

On Sat, Jul 4, 2009 at 9:33 AM, mclovinhanoo...@gmail.com wrote:

Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)...

Try flattening the arrays into a single large array  sorting it. Then
you can just iterate over the large array counting as you go; you only
ever have to insert into the dict once for each value and there's no
lookups in the dict


Actually the next step is to maintain a min-heap as you run down the
sorted array.  Something like:

import numpy as np
import heapq


def frequency(arr):
'''Generate frequency-value pairs from a numpy array'''
clustered = arr.flatten() # copy (so can safely sort)
clustered.sort() # Bring identical values together
scanner = iter(clustered)
last = scanner.next()
count = 1
for el in scanner:
if el == last:
count += 1
else:
yield count, last
last = el
count = 1
yield count, last


def most_frequent(arr, N):
'''Return the top N (freq, val) elements in arr'''
counted = frequency(arr) # get an iterator for freq-val pairs
heap = []
# First, just fill up the array with the first N distinct
for i in range(N):
try:
heap.append(counted.next())
except StopIteration:
break # If we run out here, no need for a heap
else:
# more to go, switch to a min-heap, and replace the least
# element every time we find something better
heapq.heapify(heap)
for pair in counted:
if pair  heap[0]:
heapq.heapreplace(heap, pair)
return sorted(heap, reverse=True) # put most frequent first.


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Reversible Debugging

2009-07-04 Thread Scott David Daniels

Dave Angel wrote:

Scott David Daniels wrote:

Patrick Sabin wrote:

Horace Blegg schrieb:
You might consider using a VM with 'save-points'. You run the 
program (in a debugger/ida/what have you) to a certain point 
(logical point would be if/ifelse/else statements, etc) and save the 
VM state. Once you've saved, you continue. If you find the path 
you've taken isn't what you are after, you can reload a previous 
save point and start over, trying a different path the next time.
That was my idea to implement it. I thought of taking snapshots of 
the current state every time a unredoable instruction, e.g random 
number generation, is done. 

Remember, storing into a location is destruction.
Go over a list of VM instructions and see how many of them are undoable.
Read his suggested approach more carefully.  He's not undoing 
anything.  He's rolling back to the save-point, and then stepping 
forward to the desired spot.  

Right, I did misread unredoable as undoable.  However, I suspect a
surprising amount of stuff is unredoable -- iff the random number
generator counts as one of those things.  The random number seeder is
unredoable with empty args, but running the generator once seeded is
predictable (by design).  If you don't capture the random number state
as part of your snapshot, _lots_ of C space storage will be in the
same class, and you are stuck finding the exceptional safe to use
cases, rather than the exceptional unsafe to use.  Similarly, system
calls about time or _any_ callback (when and where executed) create
snapshot points, and I suspect roll forwards will be relatively short.
In fact, in some sense the _lack_ of a callback is unredoable.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels

mclovin wrote:

OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be real time but it would be nice
it was done sorting this month.

Is there a bet guess strategy where it is not 100% accurate but much
faster?


Well, I timed a run of a version of mine, and the scan is approx 5X
longer than the copy-and-sort.  Time arr_of_arr.flatten().sort() to
see how quickly the copy and sort happens.So you could try a variant
exploiting the following property:
If you know the minimum length of a run that will be in the top 25,
then the value for each of the most-frequent run entries must show up at
positions n * stride and (n + 1) * stride (for some n).  That should
drastically reduce the scan cost, as long as stride is reasonably large.

For my uniformly distributed 0..1024 values in 5M x 5M array,
About 2.5 sec to flatten and sort.
About 15 sec to run one of my heapish thingies.
the least frequency encountered: 24716
so, with stride at

sum(flattened[:-stride:stride] == flattened[stride::stride]) == 1000
So there are only 1000 points to investigate.
With any distribution other than uniform, that should go _way_ down.
So just pull out those points, use bisect to get their frequencies, and 
feed those results into the heap accumulation.


--Scott David Daniels
--
http://mail.python.org/mailman/listinfo/python-list


Re: Clarity vs. code reuse/generality

2009-07-04 Thread Scott David Daniels

Paul Rubin wrote:

Invalid input data is not considered impossible and doesn't imply a
broken program, so assert statements are not the appropriate way to
check for it.  I like to use a function like

  def check(condition, msg=data error):
 if not condition: raise ValueError, msg

  ... 
  check (x = 0, invalid x)  # raises ValueError if x is negative

  y = sqrt(x)


And I curse such uses, since I don't get to see the troublesome value,
or why it is troublesome.  In the above case, y = sqrt(x) at least
raises ValueError('math domain error'), which is more information than
you are providing.

How about:

 ...
 if x = 0: raise ValueError('x = %r not allowed (negative)?' % x)
 ...

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Scott David Daniels

mclovin wrote:

On Jul 4, 12:51 pm, Scott David Daniels scott.dani...@acm.org wrote:

mclovin wrote:

OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be real time but it would be nice
it was done sorting this month.
Is there a bet guess strategy where it is not 100% accurate but much
faster?


Well, I timed a run of a version of mine, and the scan is approx 5X
longer than the copy-and-sort.  Time arr_of_arr.flatten().sort() to
see how quickly the copy and sort happens.So you could try a variant
exploiting the following property:
 If you know the minimum length of a run that will be in the top 25,
then the value for each of the most-frequent run entries must show up at
positions n * stride and (n + 1) * stride (for some n).  That should
drastically reduce the scan cost, as long as stride is reasonably large

sum(flattened[:-stride:stride] == flattened[stride::stride]) == 1000
So there are only 1000 points to investigate.
With any distribution other than uniform, that should go _way_ down.
So just pull out those points, use bisect to get their frequencies, and
feed those results into the heap accumulation.

--Scott David Daniels


I dont quite understand what you are saying but I know this: the times
the most common element appears varies greatly. Sometimes it appears
over 1000 times, and some times it appears less than 50. It all
depends on the density of the arrays I am analyzing.


Here's a heuristic replacement for my previous frequency code:
I've tried to mark where you could fudge numbers if the run time
is at all close.

def frequency(arr_of_arr, N, stride=100)
'''produce (freq, value) pairs for data in arr_of_arr.

Tries to produce  N pairs.  stride is a guess at half
the length of the shortest run in the top N runs.
'''
# if the next two lines are too slow, this whole approach is toast
data = arr_of_arr.flatten()  # big allocation
data.sort() # a couple of seconds for 25 million ints

# stride is a length forcing examination of a run.
sampled = data[::stride]
# Note this is a view into data, and is still sorted.
# We know that any run of length 2 * stride - 1 in data _must_ have
# consecutive entries in sampled.  Compare them in parallel
matches = sampled[:-1] == sampled[1:]
# matches is True or False for stride-separated values from sampled
candidates = sum(matches) # count identified matches

# while candidates is huge, keep trying with a larger stride
while candidates  N *10: # 10 -- heuristic
stride *= 2 # # heuristic increase
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)

# if we got too few, move stride down:
while candidates  N * 3: # heuristic slop for long runs
stride //= 2 # heuristic decrease
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)

# Here we have a nice list of candidates that is likely
# to include every run we actually want. sampled[matches] is
# the sorted list of candidate values.  It may have duplicates
former = None
past = 0
# In the loop here we only use sampled to the pick values we
# then go find in data.  We avoid checking for same value twice
for value in sampled[matches]:
if value == former:
continue # A long run: multiple matches in sampled
former = value # Make sure we only try this one once
# find the beginning of the run
start = bisect.bisect_left(data, value, past)
# find the end of the run (we know it is at least stride long)
past = bisect.bisect_right(data, value, start + stride)
yield past - start, value # produce frequency, value data

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Direct interaction with subprocess - the curse of blocking I/O

2009-07-03 Thread Scott David Daniels
:
text_read = f_read.readline() # get a line
DPRINT(after read/readline text_read:%r, len=%s,
  text_read, len(text_read))
if text_read:   # there were some bytes
text_lines += text_read
DPRINT(text_lines:%r, text_lines)
continue # Got some chars, keep going.
break  # Nothing new found, let's get out.
   return text_lines or None


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Sequence splitting

2009-07-03 Thread Scott David Daniels

Steven D'Aprano wrote:
I've never needed such a split function, and I don't like the name, and 
the functionality isn't general enough. I'd prefer something which splits 
the input sequence into as many sublists as necessary, according to the 
output of the key function. Something like itertools.groupby(), except it 
runs through the entire sequence and collates all the elements with 
identical keys.


splitby(range(10), lambda n: n%3)
= [ (0, [0, 3, 6, 9]),
 (1, [1, 4, 7]), 
 (2, [2, 5, 8]) ]


Your split() would be nearly equivalent to this with a key function that 
returns a Boolean.


Well, here is my go at doing the original with iterators:

def splitter(source, test=bool):
a, b = itertools.tee((x, test(x)) for x in source)
return (data for data, decision in a if decision), (
data for data, decision in b if not decision)

This has the advantage that it can operate on infinite lists.  For
something like splitby for grouping, I seem to need to know the cases
up front:

def _make_gen(particular, src):
 return (x for x, c in src if c == particular)

def splitby(source, cases, case):
'''Produce a dict of generators for case(el) for el in source'''
decided = itertools.tee(((x, case(x)) for x in source), len(cases))
return dict((c, _make_gen(c, src))
for c, src in zip(cases, decided))

example:

def classify(n):
'''Least prime factor of a few'''
for prime in [2, 3, 5, 7]:
if n % prime == 0:
return prime
return 0

for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items():
print('%s: %s' % (k, list(g)))

0: [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
3: [3, 9, 15, 21, 27, 33, 39, 45]
5: [5, 25, 35]
7: [7, 49]

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-02 Thread Scott David Daniels

Duncan Booth wrote:

Simon Forman sajmik...@gmail.com wrote:

...
if self.higher is self.lower is None: return
...


As a matter of style however I wouldn't use the shorthand to run two 'is' 
comparisons together, I'd write that out in full if it was actually needed 
here.


Speaking only to the style issue, when I've wanted to do something like
that, I find:

  if self.higher is None is self.lower:
  ...

more readable, by making clear they are both being compared to a
constant, rather than compared to each other.

More often, I've used code like:

  if expr1 is not None is not expr2:
  ...

since I am usually working on non-defaulting cases in the body.
I find the form above simpler to read than:

  if expr1 is not None and expr2 is not None:
  ...

I do draw the line at two, though, and with three or more I'll
paren-up a list of parallel comparisons:

  if (expr1 is not None
and expr2 is not None
and expr3 is not None):
  ...

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: invoking a method from two superclasses

2009-07-01 Thread Scott David Daniels

 Mitchell L Model wrote:

Sorry, after looking over some other responses, I went back and re-read
your reply.  I'm just making sure here, but:

 Scott David Daniels wrote:
Below compressed for readability in comparison:

   class A:
   def __init__(self): super().__init__(); print('A')
   class B:
   def __init__(self): super().__init__(); print('B')
   class C(A, B):
   def __init__(self): super().__init__(); print('C')
   C()
And, if you are doing it with a message not available in object:


Renamed to disambiguate later discussion

   class root:
   def prints(self): print('root') # or pass if you prefer
   class D(root):
   def prints(self): super().prints(); print('D')
   class E(root):
   def prints(self): super().prints(); print('E')
   class F(D, E):
   def prints(self): super().prints(); print('F')
   F().prints()



What I was missing is that each path up to and including the top of the diamond
must include a definition of the method, along with super() calls to move the 
method
calling on its way up.


Actually, not really true.  In the F through root example, any of the
prints methods except that on root may be deleted and the whole thing
works fine.  The rootward (closer to object) end must contain the method
in question, possibly only doing a pass as the action, and _not_ calling
super.  The other methods (those in D, E, and F above are all optional
(you can freely comment out those methods where you like), but each
should call super() in their bodies.

Note that you can also add a class:
class G(E, D):
def prints(self): super().prints(); print('G')
G().prints()
Also note that the inheritances above can be eventually inherits from
as well as direct inheritance.


Is this what the documentation means by cooperative multiple inheritance?


Yes, the phrase is meant to convey No magic (other than super itelf) is
involved in causing the other methods to be invoked.  If you want all
prints methods called, make sure all but the last method do super calls.
Of course, any method that doesn't do the super call will be the last by
definition (no flow control then), but by putting the non-forwarding
version at or below the lower point of the diamond, the mro order
guarantees that you will have a good place to stop.

Think of the mro order as a solution to the partial order constraints
that a class must appear before any of its direct superclasses, and (by
implication) after any of its subclasses.


If your correction of my example, if you remove super().__init__ from B.__init__
the results aren't affected, because object.__init__ doesn't do anything and
B comes after A in C's mro. However, if you remove super().__init__ from
A.__init__, it stops the supering process dead in its tracks.


Removing the super from B.__init__ means that you don't execute
object.__init__.  It turns out that object does nothing in its __init__,
but without knowing that, removing the super from B.__init__ is also a
mistake.

So, you may well already have it all right, but as long as I'm putting
in the effort to get the operational rules about using super out, I
thought I'd fill in this last little bit.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multi thread reading a file

2009-07-01 Thread Scott David Daniels

Gabriel Genellina wrote:

...
def convert(in_queue, out_queue):
  while True:
row = in_queue.get()
if row is None: break
# ... convert row
out_queue.put(converted_line)


These loops work well with the two-argument version of iter,
which is easy to forget, but quite useful to have in your bag
of tricks:

def convert(in_queue, out_queue):
for row in iter(in_queue.get, None):
# ... convert row
out_queue.put(converted_line)

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Basic question from pure beginner

2009-07-01 Thread Scott David Daniels

Charles Yeomans wrote:

Let me offer a bit of editing
Finally, I'd remove correct_password_given from the loop test, and 
replace it with a break statement when the correct password is entered.


password = qwerty
correct_password_given = False
attemptcount = 0
MaxAttempts = 3
while attemptcount  MaxAttempts:
  guess = raw_input(Enter your password: )
  guess = str(guess)
  if guess != password:
  print Access Denied
  attemptcount = attemptcount + 1
  else:
  print Password Confirmed
  correct_password_given = True
  break



And even simpler:
PASSWORD = qwerty
MAXRETRY = 3
for attempt in range(MAXRETRY):
if raw_input('Enter your password: ') == PASSWORD:
print 'Password confirmed'
break # this exits the for loop
print 'Access denied: attempt %s of %s' % (attempt+1, MAXRETRY)
else:
# The else for a for statement is not executed for breaks,
# So indicates the end of testing without a match
raise SystemExit # Or whatever you'd rather do.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: identify checksum type?

2009-06-30 Thread Scott David Daniels

Christian Heimes wrote:

PK schrieb:

Given a checksum value, whats the best way to find out what type it is?

meaning. I can use hashlib module and compute a md5 or sha1 for a given data
etc..but given a checksum value say d2bda52ee39249acc55a75a0f3566105 whats
the best way for me to identify if its a sha1 or md5 or anyother sum type
for that matter?

is there a nice way to do this in python?


As far as I know there is no way to identify a checksum by its value. A
checksum is just a number. You can try an educated guess based on the
length of the checksum. Or you can try all hash algorithms until you get
a hit but that may lead to security issues.

Some applications prefix the hash value with an identifier like {MD5}
or {SHA1}.

Christian


fortunately, the hashlib checksums can be distinguished by their length
On the newly minted 3.1:
import hashlib
text = b'BDFL forever; FLUFL for frequently'
for name in 'md5 sha1 sha224 sha256 sha384 sha512'.split():
result = getattr(hashlib, name)(text).hexdigest()
print('%6s:%3d %s' % (name, len(result), result))

   md5: 32 457484d2817fbe475ab582bff2014e82
  sha1: 40 242076dffbd432062b439335438f08ba53387897
sha224: 56 89c0439b1cf3ec7489364a4b8e50b3ba196706eecdb5e5aec6d6290f
sha256: 64 e10938435e4b5b54c9276c05d5f5d7c4401997fbd7f27f4d4...807d
sha384: 96 3fe7c7bf3e83d70dba7d59c3b79f619cf821a798040be2177...edb7
sha512:128 fe50d9f0c5780edb8a8a41e317a6936ec6305d856c78ccb8e...1fa0

You'll have to guess for adler32 vs. crc32 vs. seeded crc32, ...

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >