ArielGlenn has uploaded a new change for review. https://gerrit.wikimedia.org/r/64955
Change subject: bugfixes: handle deleted text; workaround dupl text ids ...................................................................... bugfixes: handle deleted text; workaround dupl text ids Revisions with deleted text will get text id of 0 and len of 0 The rest will get a text id identical to the rev id to avoid collisions: in live mediawiki projects it is possible for two revs to have the same text id (example: revision created by protecting a page), but rev ids are guaranteed to be unique. If the 'nodrop' option is supplied, which means that INSERT IGNORE statements will be generated for these tables, the original text ids from the xml will be used. That last workaround needs to be documented better. Way better. Change-Id: I8ca20d7fb7c65713c7b62b74c963d2f298953932 --- M xmlfileutils/mwxmlelts.c 1 file changed, 17 insertions(+), 3 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/dumps refs/changes/55/64955/1 diff --git a/xmlfileutils/mwxmlelts.c b/xmlfileutils/mwxmlelts.c index 0bd1b1a..df8ac36 100644 --- a/xmlfileutils/mwxmlelts.c +++ b/xmlfileutils/mwxmlelts.c @@ -358,8 +358,8 @@ } /* text: old_text old_flags */ /* write the beginning piece */ - snprintf(buf, sizeof(buf), \ - "(%s, '",r->text_id); + snprintf(buf, sizeof(buf),"(%s, '",r->text_id); + put_line_all(sqlt, buf); if (verbose > 1) fprintf(stderr,"text info: insert start of line written\n"); @@ -656,9 +656,23 @@ } else if (! result) break; if (!strcmp(name, "id")) - strcpy(r.text_id, value); + if (insert_ignore) + /* separate revisions can have the same text id if only the metadata was changed, + e.g. the page was protected. use the original text ids therefore only + if we are going to write INSERT IGNORE to ignore dup entries */ + strcpy(r.text_id, value); + else + /* fallback since the rev id is guaranteed to be unique, is use this as the text + id, this means the resulting db won't be quite identical to the one from + which data was dumped */ + strcpy(r.text_id, r.id); else if (!strcmp(name, "bytes")) strcpy(r.text_len, value); + else if (!strcmp(name, "deleted")) { + strcpy(r.text_id, "0"); + strcpy(r.text_len, "0"); + break; + } else { whine("unknown attribute in text tag"); break; -- To view, visit https://gerrit.wikimedia.org/r/64955 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8ca20d7fb7c65713c7b62b74c963d2f298953932 Gerrit-PatchSet: 1 Gerrit-Project: operations/dumps Gerrit-Branch: ariel Gerrit-Owner: ArielGlenn <ar...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits