Hi Karel, 

On 24 Sep 2014 at 11:49:02, Karel Gardas 
([email protected](mailto:[email protected])) wrote:

>  
> Hello,
>  
> I'm trying to import Wikipedia xml (English dumps w/o history) into the
> Xwiki 6.0.1 running using PostgreSQL as a DB. I'm using mediawiki/1.0
> syntax to easy job on my side especially when the task is to test if
> xwiki is able to hold just this amount of data and nothing more.

Interesting experiment :)

> So far probably the most critical found issues are:
>  
> 1) wikipedia's links are a little bit longer than expected. I'm afraid
> this is usually whole citation going into the link hence after
> installing xwiki and initialization of hibernate I needed to switch it
> off and alter PostgreSQL table by:
>  
> alter table xwikilinks alter column xwl_link type varchar(4096);
>  
> that ensures that much more pages may be imported.

The xwiklinks table is the table containing all the backlinks for a given 
document.

Indeed the default is 255 chars for the “link” field which contains a 
serialized reference to linked pages (but without the wiki part if the wiki is 
the same as the wiki of the document containing the link).

And "fullName” is also 255 chars by default and contains a serialized reference 
of the document containing a link (without the wiki part).

So indeed it can quickly become not enough if space names and wiki pages are a 
bit long.

> 2) while importing I hit issue on duplication of the xwikircs_pkey key.
> It shows as:
>  
> STATEMENT: insert into xwikircs (XWR_DATE, XWR_COMMENT, XWR_AUTHOR,
> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) values ($1, $2, $3, $4, $5, $6)
> ERROR: duplicate key value violates unique constraint "xwikircs_pkey"
> DETAIL: Key (xwr_docid, xwr_version1,
> xwr_version2)=(3170339397610733377, 1, 1) already exists.
>  
> in PostgreSQL console and as:
>  
> 2014-09-22 00:53:51,601
> [http://localhost:8080/xwiki/rest/wikis/xwiki/spaces/Wikipedia/pages/Brecon_&_Radnorshire]
> WARN o.h.u.JDBCExceptionReporter - SQL Error: 0, SQLState: 23505
> 2014-09-22 00:53:51,601
> [http://localhost:8080/xwiki/rest/wikis/xwiki/spaces/Wikipedia/pages/Brecon_&_Radnorshire]
> ERROR o.h.u.JDBCExceptionReporter - Batch entry 0 insert into
> xwikircs (XWR_DATE, XWR_COMMENT, XWR_AUTHOR, XWR_DOCID, XWR_VERSION1,
> XWR_VERSION2) values ('2014-09-22 00:53:51.000000 +02:00:00', '',
> 'XWiki.Admin', 3170339397610733377, 1, 1) was aborted. Call
> getNextException to see the cause.
> 2014-09-22 00:53:51,601
> [http://localhost:8080/xwiki/rest/wikis/xwiki/spaces/Wikipedia/pages/Brecon_&_Radnorshire]
> WARN o.h.u.JDBCExceptionReporter - SQL Error: 0, SQLState: 23505
> 2014-09-22 00:53:51,601
> [http://localhost:8080/xwiki/rest/wikis/xwiki/spaces/Wikipedia/pages/Brecon_&_Radnorshire]
> ERROR o.h.u.JDBCExceptionReporter - ERROR: duplicate key value
> violates unique constraint "xwikircs_pkey"
> Detail: Key (xwr_docid, xwr_version1,
> xwr_version2)=(3170339397610733377, 1, 1) already exists.
>  
> in xwiki/tomcat console.
>  
> This issue I'm not able to solve so far as it looks like the key value
> itself is somehow generated by xwiki probably from some other data and
> I'm not able to find so far related code.

The code is in XWikiDocument.getId().

There’s this caveat in the code:

        // TODO: Ensure uniqueness of the generated id
        // The implementation doesn't guarantee a unique id since it uses a 
hashing method which never guarantee
        // uniqueness. However, the hash algorithm is really unlikely to 
collide in a given wiki. This needs to be
        // fixed to produce a real unique id since otherwise we can have 
clashes in the database.

I don’t have much ideas except rename the pages causing the problems since the 
unique id is computed based on that.

Here’s the algorithm FYI:

/**
 * <p>
 * Serialize a reference into a unique identifier string within a wiki. Its 
similar to the
 * {@link UidStringEntityReferenceSerializer}, but is made appropriate for a 
wiki independent storage.
 * </p>
 * <p>
 * The string created looks like {@code 5:space3:doc} for the {@code 
wiki:space.doc} document reference.
 * and {@code 5:space3:doc15:xspace.class[0]} for the 
wiki:space.doc^wiki:xspace.class[0] object.
 * (with {@code 5} being the length of the space name, i.e the length of {@code 
space} and {@code 3} being the length of
 * the page name, i.e. the length of {@code doc}).
 * </p>

Denis might know better since he improved the uniqueness some time back.

> Also the question is if this is kind of hash function if I did not break
> that by making links longer by hack in (1).

No, it’s unrelated.

Thanks
-Vincent

> Any comment on (1) and its correctness and idea to fix (2) is highly
> appreciated here.
>  
> Thanks!
> Karel
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to