Re: [xwiki-devs] XWiki on Cassandra

Caleb James DeLisle Wed, 03 Aug 2011 05:30:15 -0700


On 08/03/2011 04:53 AM, Ludovic Dubost wrote:
> Hi Caleb,
> 
> Exiting news indeed. This looks great and there seems indeed to have
> quite a few things already working.
> 
> I have one question concerning the mention of the multiple XWiki nodes
> connecting in different location to multiple Cassandra nodes. This
> would also mean that there is some tweaking in the XWiki Cache or a
> new "cluster" mode which allows WAN communication between instances.


Indeed the xwiki cluster code would need to be able to invalidate cache entries 
over a WAN connection.
The Cassandra code already supports operating over a WAN and it is "eventually 
consistent" which in practice means all nodes are up to date within a few 
seconds.

> Otherwise you could be editing or viewing an older version than what's
> really in the cassandra store.
> 
> Have you looked at this already ? If you have touched the XWiki Cache
> maybe that's why you have performance issues. It is importante to
> cache the XWikiPreferences document, as it is highly requested. One of
> the things I did on the Google Store work I did a while ago, is to
> have a special additional cache in the XWikiContext which would make
> sure we don't check the MemCache that was containing the most recent
> version number of XWiki documents. This allowed to have a decent
> performance. Only the first access to a document in a given HTTP
> request would trigger a version number verification.

I have not needed to make any changes to the cache, caching the context is an 
interesting idea.
What I would most like to do is follow the line of execution and find the 
biggest costs and mitigate them.
This would mean patches which could be merged back into master.
The system is quite fast when run on my desktop (which has a lot of ram) so it 
seems to be associated with the resource constraints of the system.

> 
> I was wondering how you handle the queries used by the XWiki core and
> the default XAR application ? In the end I believe we need to move
> core queries to XWQL to have compatibility accross stores.

What I have been doing so far is using named queries which I can implement both 
in HQL and JDOQL. Whether or not getting the stores to match up enough to be 
able to drag and drop complex queries from one to another is feasible, I can't 
say.

> 
> I saw that wiki macros don't seem to work. This must be because of
> missing objects queries.

The macros do seem to be working (I did some work on the queries used) but the 
list of spaces was rather lonely with a broken tag cloud and activity stream so 
I removed it.

> 
> In terms of priorities I believe the following are important:
> 
> - assesment of which default XE feature is not working and what it
> requires to make it work (this would allow to "define" what a
> Cassandra XE version would be)

I think the general answer to this is: things which do not rely heavily on 
complex queries which such as history, activity stream and permissions should 
be easy while things which do such as some of the applications will be 
difficult.

> - basic XWQL querying with queries on objects

The next big step is going to be patching the store code so that it takes 
advantage of NoSQL's flexibility in adding columns to single rows as a means of 
storing and querying structured data without first knowing what the structure 
will be.
This is obviously not supported currently since DataNucleus attempts to support 
all different data stores and this would simply not be possible with a 
relational store.

Caleb

> - history
> - permissions
> Also at some point
> - performance comparaison with a very large number of documents / very high 
> load
> 
> Great stuff in any case..
> 
> Ludovic
> 
> 2011/8/2 Caleb James DeLisle <[email protected]>:
>> I have an instance of XWiki finally running on Cassandra.
>> http://kk.l.to:8080/xwikiOnCassandra/
>>
>> Cassandra is a "NoSQL" database, unlike a traditional SQL database it cannot 
>> do advanced queries but it can store data in a more flexible way eg: each 
>> row is like a hashtable where additional "columns" can be added at will.
>> The most important feature of Cassandra is that multiple Cassandra nodes can 
>> be connected together into potentially very large "swarms" of nodes which 
>> reside in different racks or even data centers continents apart, yet all of 
>> them represent the same database.
>> Cassandra was developed by Facebook and their swarm was said to be over 200 
>> nodes strong.
>> In it's application with XWiki, each node can have an XWiki engine sitting 
>> on top of it and users can be directed to the geographically closest node or 
>> to the node which is most likely to have a cache of the page which they are 
>> looking for.
>> Where a traditional cluster is a group of XWiki engines sitting atop a 
>> single MySQL engine, this allows for a group of XWiki engines to sit atop a 
>> group of Cassandra engines in a potentially very scalable way.
>> In a cloud setting, one would either buy access to a provided NoSQL store 
>> such as Google's BigTable or they would setup a number of XWiki/Cassandra 
>> stacks in a less managed cloud such as Rackspace's or Amazon's.
>>
>> How it works:
>> XWiki objects in the traditional Hibernate based storage engine are 
>> persisted by breaking them up into properties which are then joined again 
>> when the object is loaded.
>> A user object which has a name and an age will occupy a row in each of three 
>> tables, the xwikiobjects table, the xwikistrings table, and the 
>> xwikiintegers table.
>> The object's metadata will be in the xwikiobjects table while the name will 
>> be in a row in the xwikistrings table and the age, a number, will go in the 
>> xwikiintegers table.
>> The NoSQL/Datanucleus based storage engine does this differently, the same 
>> object only occupies space in the XWikiDocument table where it takes 
>> advantage of Cassandra's flexibility by simply adding a new column for each 
>> property.
>> NOTE: this is not fully implemented yet, objects are still stored serialized.
>>
>> What works
>>
>> * Document storage
>> * Classes and Objects
>> * Attachments
>> * Links and Locks
>> * Basic querying with JDOQL
>>
>> What doesn't work
>>
>> * Querying inside of objects
>> * JPQL/XWQL queries
>> * Document history
>> * Permissions (requires unimplemented queries)
>> * The feature you want
>>
>>
>> I am interested in what the community thinks is the first priority, I can 
>> work on performance which will likely lead to patches being merged into 
>> master which will benefit everyone
>> or I can work on more features which will benefit people who want to use 
>> XWiki as a traditional application wiki but use it on top of Cassandra.
>> You can reply here or add comments to the wiki ;)
>>
>> Caleb
>>
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
> 
> 
> 

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] XWiki on Cassandra

Reply via email to