(i forgot to add the list address to this reply - Jesper, thanks for
inspiring me to make all these queries - i hope you don't think I was
trying to shut your comment down by any means as you hold a solid point
about the enormity of the field) - Mark
Jesper Noehr wrote:
Hi Mark,
It makes sense to resize the column to fit the data, although I don't
think 65536 bytes of binary data is appropriate. This would put a more
serious impact on performance. I think as of MySQL 4.1, varchars are
not limited to 255 any longer, do you know if this is correct? If so,
I think the column could do with being resized to ~500 or so.
j
Hi Jesper,
According to the mysql manual, it seems that a VARCHAR can be over 255
chars (from my understanding of the description). But it doesn't seem
to specify the actual limit.
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
Under the heading "Storage Requirements for String Types"
But the code I proposed, though it does specify varchar=True, is not
actually creating a VARCHAR column (as described in my last reply), but
rather a MEDIUMTEXT.
As for the length of the field for the application, I've little say. I
was using this specifically for the imdb -> sql conversion and plan to
use the data moreso than the imdbpy library (for now at least). But, in
the interest of the thread, I ran a few queries to get a better
understanding of what's in the person_info table.
SELECT AVG(LENGTH(info)) FROM person_info
returns 88.4113, so you definitely have a point about truncating, but
SELECT COUNT(id) FROM person_info WHERE LENGTH(info) > 500
returns 40781, which is about 2% of the data
SELECT COUNT(id) FROM person_info WHERE LENGTH(info) > 1000
returns 20224, which is about 1% of the data.
1500 is about 1/2 a percent (.006)
Here's some general statistics about the table
SELECT t.info, AVG(LENGTH(p.info)) AS AVGLength, MAX(LENGTH(p.info)) AS
MaxLength, COUNT(p.id) AS Records FROM person_info p LEFT JOIN info_type
t ON p.info_type_id = t.id GROUP BY p.info_type_id ORDER BY
AVG(LENGTH(p.info)) DESC
info AVGLength MaxLength Records
mini biography 1328.7544 77754 41996
quotes 199.1172 10402 42137
articles 105.4514 994 71346
trivia 104.6574 6197 403739
interviews 102.0143 390 33992
books 98.4247 423 14018
pictorials 93.6796 659 42350
other works 90.6187 6658 199107
where now 87.1480 1008 21929
trademarks 53.4940 1121 7892
spouse 47.0573 124 82573
biographical movies 45.4442 199 4268
salary history 43.9208 159 4469
magazine covers 41.3896 151 46944
portrayed 36.2904 123 8496
death notes 29.9600 150 65228
birth notes 22.7949 112 236797
birth name 21.4084 218 83942
death date 13.4271 95 77152
birth date 12.5768 39 258693
nick names 7.9800 129 34466
height 5.5751 13 71154
Here's what's being cut off over 1000 chars
SELECT t.info, COUNT(p.id) AS Records FROM person_info p LEFT JOIN
info_type t ON p.info_type_id = t.id WHERE LENGTH(p.info) > 1000 GROUP
BY p.info_type_id ORDER BY Records DESC
info Records
mini biography 19238
trivia 438
other works 344
quotes 202
trademarks 1
where now 1
And over 2000 chars
info Records
mini biography 7616
other works 50
trivia 48
quotes 12
Hope this helps...
Have a great weekend!
Mark
On Apr 3, 2008, at 7:48 PM, Mark Armendariz wrote:
Hello,
I've just joined this list. I found this app this morning and
downloaded the imdb database last night. While converting the data,
I was getting an error stating that some data was too long for the
person_info.info field in the biographies list.
This patch changes the person_info.info field from a text to a
mediumtext field.
After making this change, I was able to successfully convert the text
files into a mysql database.
Thanks!!
Mark Armendariz
=== modified file 'imdbpy/imdb/parser/sql/dbschema.py'
--- imdbpy/imdb/parser/sql/dbschema.py 2008-04-02 06:48:00 +0000
+++ imdbpy/imdb/parser/sql/dbschema.py 2008-04-02 06:49:41 +0000
@@ -128,7 +128,7 @@
class PersonInfo(SQLObject):
personID = IntCol(notNone=True)
infoTypeID = IntCol(notNone=True)
- info = UnicodeCol(notNone=True)
+ info = BLOBCol(notNone=True, varchar=True, length=2**16)
note = UnicodeCol(default=None)
class RoleType(SQLObject):
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel