-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi everybody, I have been doing some experiments with ddl-utils recently, and found several tricky things... :-). For most of them, I found workarounds, but of course it would be even better to have other workarounds embedded in ddl-utils themselves. Thanks in advance and sorry for the long mail :-). First, I tried to compile ddlutils from SVN. The problem here is that I failed to compile only one part of the toolkit. So after installing some of the usual packages, the build process told me that I would need forrest.apache.org - apparently to create the documentation in HTML and PDF. Forrest is a really big package... Next it told me that it would depend on two DTD thingies, too, luckily always mentioning where I can download them. Then it complained that I did not have the DTD or similar for some part of the data, which was when I gave up. Next attempt was to download the 1.0 binary. It turned out that the binary already contains three things: The docs, the main jar, and most of the supplemental jars. Very nice. Of course I only found out after having downloaded the separate doc download and jar download... ;-). The binary zip seemed to be very easy to install, but I had troubles with the classpath. In the end, I just used the following: ant -d -v -lib ddlutils/lib/ -f my-ant-tasks.xml ... (I also copied the main ddlutils jar into .../lib/ ) Now finally I was able to dump the mysql database that I had received as proprietary mysql dump file and then loaded into a mysql that I have installed especially for that purpose :-) Next, I tried to import that file into my normal postgresql database, again using ddlutils. This failed, and I had to change 2 things to work around the problem: Somehow mysql uses 0000-00-00 00:00:00 and null as synonyms for null for timestamps. However, you cannot insert such invalid (month/day are 0) date timestamps into postgresql databases. As workaround, either of the following did work: Search and replace all 0000-... timestamps by 1980-01-01 timestamps in the XML file, issue SQL commands for each column of type timestamp to update values which are null or earlier than 1970-01-01 into 1980-... timestamps, or search and replace all 0000-... timestamps into null timestamps... I did not test the latter as it would have been hard to explain to my text editor. Second problem is that bugzilla uses indexes on some varchar columns. Postgresql uses btree indexes in a way where each value must be at most 8/3 kB. This caused errors for rows with long values. The solution was to comment out (in the XML file for the database schema) the part which tells that an index should be made for the longdescs.thetext column. I can live without that index ;-). As connectors, I used: mysql-connector-java-5.0.8-bin.jar postgresql-8.2-507.jdbc4.jar ... although I do not know if JDBC4 is of any added value for DDL-UTILS :-) Next, I tried to export some data back from the postgresql database to XML files. It turned out that ddl-utils failed to read the database schema from the live database because of a missing schema function. I was unable to get ddl-utils to proceed even if I told it only to write the contents... Luckily, others have has similar problems before: ERROR: function information_schema._pg_keypositions() does not exist is mentioned on http://forum.hibernate.org/viewtopic.php?t=957909&view=next as is some workaround: It might make a difference which JDBC connector you use, but it always helps to: CREATE FUNCTION information_schema._pg_keypositions() RETURNS SETOF int4 AS $BODY$select g.s from generate_series(1,current_setting('max_index_keys')::int,1) as g(s)$BODY$ LANGUAGE 'sql' IMMUTABLE; ALTER FUNCTION information_schema._pg_keypositions() OWNER TO postgres; Note that you have to do this for the database in question, not for the global information schema. I think the latter only affects databases created in the future. The problem only seems to occur in postgresql 8.x ... An older postgres 7.4 of mine already does have a function of this name, but in a "flat" and less portable implementation: myolddb=> \df+ information_schema._pg_keypositions List of functions Result data type | Schema | Name | Argument data types| Owner | Language | Source code | Description - ------------------+--------------------+------------------+---------------------+----------+----------+----------------------------------------------------------+------------- setof integer | information_schema | _pg_keypositions || postgres | sql | select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 union all select 11 union all select 12 union all select 13 union all select 14 union all select 15 union all select 16 union all select 17 union all select 18 union all select 19 union all select 20 union all select 21 union all select 22 union all select 23 union all select 24 union all select 25 union all select 26 union all select 27 union all select 28 union all select 29 union all select 30 union all select 31 union all select 32 | (1 row) Now things worked quite okay, but I still could not find any workaround for my last problem: The files always contain ALL data. By using schemapatterns of "public" or "p%" I can select between files with and without the indexes as part of the schema (no idea why it is "p%", just trial and error here), but I cannot use any TABLE NAME PATTERN. Would be great if ddl-utils could implement that filter! I guess a sort of workaround would be to copy the whole database to another inside postgresql and then drop all tables that I do not want in the XML files and then use ddl-utils to export the new DB into XML files. Or use some good XML file editor to remove the unneeded tables, but that will be a problem because the data XML files can be 100s of megabytes for some of the DB which I want to use ddl-utils on... For the latter, it can be useful to say ANT_OPTS=-Xmx1536M in the shell first ;-). Last but not least, I was unable to dump some databases as "mortal" user because for unknown reason, pga_diagrams is only readable for the superuser on one postgresql installation while it is readily accessible even for low priv users on other installations...? It is interesting how much metadata is floating around in modern 8.x versions of postgresql :-). It would be cool if future ddl-utils could have built-in workarounds for the 0000-... timestamp issue and some of the other metadata / filtering magic described above. Thanks :-). And of course it would be good if ddl-utils were easier to compile from SVN... Eric PS: troubles in the precompiled docs I downloaded later: The PDF uses very dark blue compared to the HTML and the UML diagrams are not shown in the PDF. And there is a content error where the docs say that some DB cannot do tinyint in -255..255 range but only in (signed byte) range. Some other place in the docs says that tinyint can be signed or unsigned byte - it does not mention anything like signed 9 bit value ;-). By the way, VARCHAR can be up to 10*1024*1024 Unicode chars, not 254 ASCII, at least in 8.x versions of PostgreSQL :-). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHmI+A99dkROyhRRsRAhUUAJ9vEjkjnYMR62vqBro517sksZH76ACeNQSV fEcFf2xpWmmbODMzI2zsLq4= =o8C+ -----END PGP SIGNATURE-----
