Hi David,

Davide Alberani wrote:
On Apr 03, Mark Armendariz <[EMAIL PROTECTED]> wrote:

I've just joined this list. I found this app this morning and downloaded the imdb database last night. While converting the data, I was getting an error stating that some data was too long for the person_info.info field in the biographies list.

Whoah! :-)
I'll be unable to download an updated version of the plain text
data files for another week or two, so I can't check this right now.
fortunately, you'll only need the person info file to test. =)
This patch changes the person_info.info field from a text to a mediumtext field.

I'm a bit reluctant to change the type of a field: I have to
consider how BLOBCol works with other (non-MySQL) databases and
its behavior regarding non-ascii chars (usually stored as utf8 -
but it's not a requirement - and retrieved/internally managed by
IMDbPY as unicode strings).
I was trying to figure out the best way to make a minimal change to the code. All I really wanted to do was use a MEDIUMTEXT instead of a plain TEXT field so it could fit the data properly. I would have been fine finding a way to suppress the error - but thought it might be better to resolve it instead.

I tried the sqlobject documentation, but didn't find what I needed. So, I cracked open the sqlobject code and found that the "UnicodeCol" type can only make TEXT and VARCHAR fields. I dug further to find the 'proper' way to make a MEDIUMTEXT.

In SOBlobCol (which is the meat of BlobCol), I found this:

----- Snip -----
   def _mysqlType(self):
       length = self.length
       varchar = self.varchar
       if length >= 2**24:
           return varchar and "LONGTEXT" or "LONGBLOB"
       if length >= 2**16:
           return varchar and "MEDIUMTEXT" or "MEDIUMBLOB"
       if length >= 2**8:
           return varchar and "TEXT" or "BLOB"
       return varchar and "TINYTEXT" or "TINYBLOB"
----- /Snip -----

Which essentially meant to me that if I set it to varchar=True it would use a TEXT field, and if I set the length anywhere between 2**16 and 2**24 - 1, it would give me a MEDIUMTEXT field, which, fortunately, is not a BLOB column - seems to me that this should probably be in UnicodeCol instead, but I digress.

After making this change, I was able to successfully convert the text files into a mysql database.

That's a bit strange - can you help me on this?
What's your db engine (InnoDB, MyISAM, etc)?
The 'person_info.info' column created by an _unpatched_ version of
IMDbPY, what kind of field (TEXT, VARCHAR, ...) creates?
Can you insert a text longer than 2**16-1 bytes in a TEXT field,
on your system?  On mine it's possible - and the value is silently
truncated.

Can you scan the database you've populated with your patched IMDbPY?
I'm interested in the longest strings:
  SELECT LENGTH(info), info, id FROM person_info ORDER BY LENGTH(info) DESC 
LIMIT 0, 10


I'm using Mysql 5.0.45-community with InnoDB tables. Here's the response from your query:

LENGTH(info)     info     id
77754      Susan Atkins, a.k.a. Sadie Mae Glutz, is one of th...      68871
55448     One of seven children, Frank Capra was born on May...     259006
40908     Maurice Tourneur, the film director and screenwrit...     1675285
36290     Anna May Wong, the first Chinese-American movie st...     1814292
35348     Christopher Dusendschön is a Filmmaker and a Holly...     472793
34707     Maurice Rapf, the Hollywood screenwriter who becam...     1370052
31799     John Hay "Jock" Whitney, the multi-millionaire spo...     1785987
30605     Emilio "El Indio" Fernández Romo, who was born on ...     529248
30539     Marlon Brando is widely considered the greatest mo...     195182
29863     What do the classic and near-classic films _Dawn P...     714084


It seems to be only that first column that stretched beyond the limit of a TEXT field.

+    info = BLOBCol(notNone=True, varchar=True, length=2**16)

I'm not a database expert, but I think a TEXT field can contains
2**16-1 bytes; here you're creating a MEDIUMTEXT (or MEDIUMBLOB?)
just a byte longer... how can it be enough?

Well, you can't set a length on a TEXT field (including TINYTEXT, MEDIUMTEXT, LONGTEXT, etc). The length limit is inherent to mysql.

MEDIUMTEXT will allow 2**24 - 1

Thank you very much for your support!


No, Thank You!!

Mark
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to