[Imdbpy-devel] fixes, bugs and The Creepy Case Of The Strange Chars Haunted File

Davide Alberani Wed, 03 Jan 2007 14:31:56 -0800

Hi all, and happy new year!

In the CVS there's (hopefully) a bugfix for a problem searching
titles/names with Strange Chars(tm) (i.e. anything but ASCII).
The problem was noticeable only for the 'http' and 'mobile' data
access systems, using Python 2.4.3 or later, 2.5 included.
Try doing some funny-names searches and let me know if everything
blows up. :-)


Another thing that I've noticed only today: in the biographies.list
file some Strange Chars(tm) are replaced with their XML references
(e.g.: &#263; for an acute accented i).
Some are replaced, and some are not.  Quite silly.

Actually my local mirror of the plain text data files is a bit old,
so if someone can run a pair of tests against an up-to-date
version, will help to fully understand the situation:

See how many lines are affected in the biographies.list file:
  $ zgrep -c '&#[0-9]\{3,5\};' biographies.list.gz

In my local copy there are about 50 lines with &#...; references,
in a file with over 4.2mln of lines.

See if other files are affected by the same situation:
  $ zgrep -l '&#[0-9]\{3,5\};' *.gz

If only so few lines in a single file are affected, I think it's
better to ignore them, rather than pay the overhead of the replacement
with the matching unicode char.


-- 
Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47]
http://erlug.linux.it/~da/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

[Imdbpy-devel] fixes, bugs and The Creepy Case Of The Strange Chars Haunted File

Reply via email to