ArielGlenn has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/64955


Change subject: bugfixes: handle deleted text; workaround dupl text ids
......................................................................

bugfixes: handle deleted text; workaround dupl text ids

Revisions with deleted text will get text id of 0 and len of 0
The rest will get a text id identical to the rev id to avoid collisions:
in live mediawiki projects it is possible for two revs to
have the same text id (example: revision created by protecting
a page), but rev ids are guaranteed to be unique.

If the 'nodrop' option is supplied, which means that
INSERT IGNORE statements will be generated for these
tables, the original text ids from the xml will be used.

That last workaround needs to be documented better. Way better.

Change-Id: I8ca20d7fb7c65713c7b62b74c963d2f298953932
---
M xmlfileutils/mwxmlelts.c
1 file changed, 17 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/dumps 
refs/changes/55/64955/1

diff --git a/xmlfileutils/mwxmlelts.c b/xmlfileutils/mwxmlelts.c
index 0bd1b1a..df8ac36 100644
--- a/xmlfileutils/mwxmlelts.c
+++ b/xmlfileutils/mwxmlelts.c
@@ -358,8 +358,8 @@
   }
   /* text: old_text old_flags */
   /* write the beginning piece */
-  snprintf(buf, sizeof(buf),                                           \
-          "(%s, '",r->text_id);
+  snprintf(buf, sizeof(buf),"(%s, '",r->text_id);
+
   put_line_all(sqlt, buf);
 
   if (verbose > 1) fprintf(stderr,"text info: insert start of line written\n");
@@ -656,9 +656,23 @@
     }
     else if (! result) break;
     if (!strcmp(name, "id"))
-      strcpy(r.text_id, value);
+      if (insert_ignore)
+       /* separate revisions can have the same text id if only the metadata 
was changed,
+          e.g. the page was protected.  use the original text ids therefore 
only
+          if we are going to write INSERT IGNORE to ignore dup entries */
+       strcpy(r.text_id, value);
+      else
+       /* fallback since the rev id is guaranteed to be unique, is use this as 
the text
+          id, this means the resulting db won't be quite identical to the one 
from
+          which data was dumped */
+       strcpy(r.text_id, r.id);
     else if (!strcmp(name, "bytes"))
       strcpy(r.text_len, value);
+    else if (!strcmp(name, "deleted")) {
+      strcpy(r.text_id, "0");
+      strcpy(r.text_len, "0");
+      break;
+    }
     else {
       whine("unknown attribute in text tag");
       break;

-- 
To view, visit https://gerrit.wikimedia.org/r/64955
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ca20d7fb7c65713c7b62b74c963d2f298953932
Gerrit-PatchSet: 1
Gerrit-Project: operations/dumps
Gerrit-Branch: ariel
Gerrit-Owner: ArielGlenn <ar...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to