Re: [xwiki-devs] XWiki Scalability/Challenge Question

Paul Libbrecht Tue, 26 Aug 2014 14:55:18 -0700

Hello Danilo,

against GoogleBot trying all these fancy (linked) actions, I'd suggest you make 
use of Robots.txt.
We've made almost all actions away from robots using it:
  http://www.curriki.org/robots.txt
Of course, you can also write apache rewrite rules… this is finer grained (even 
checking the identity of the client).


On the solr random queries, I am a bit surprised your scenario works… what 
random value would you take?
Is that random only for sorting and you use the default query (*:*)? I guess 
that would work (it's not a random query then, it's a random ordering, 
something you don't want the users to intentionally formulate I think).

paul

On 26 août 2014, at 23:33, Danilo Oliveira <[email protected]> wrote:

> Hello Clemens,
> 
> I have checked the XWIKILIST and I noticed that my genre, country and
> language lists of the movies, that are defined in my movieClass, are
> recorded in this table. Do you think that is the cause of the slowness?
> 
> But I discovered who is generating these queries, the GoogleBot see:
> 66.249.69.197 - - [26/Aug/2014:18:07:09 -0300] "GET
> /bin/view/Main/Tags?do=viewTag&tag=tang-breakfast-drink HTTP/
> There are googlebots requisitions trying to delete my tags too...
> Well. I am blocking them according this doc[0]
> 
> 
> Rodrigues,
> I accessed the neo4j site. This db looks like very interesting and I think
> that is applicable to my application. However my app is in proof of concept
> phase, so actually XWiki attend my necessities. Absolutely, I will consider
> it if my application grows. thanks for the tip!
> 
> Well, I changed my queries to SOLR and now my application is working
> perfectly, even better than at the beginning.
> 
> But I have just one more necessity. Random Query.
> 
> I checked in SOLR how to make a random query and I found this article [2]
> 
> And on the "Additional Configuration" section of the article, you can read
> that we need the two parameters below in the schema configuration. However
> In schema.xml of XWiki, we just have the first [1]
> 
> <fieldType name="random" class="solr.RandomSortField" indexed="true" />
> <dynamicField name="random_*" type="random" />
> 
> I am not expert on SOLR, but if I just add the second parameter will it
> work or Do I need to worry about other things?
> 
> [0]http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Performances
> [1]
> https://github.com/xwiki-contrib/xwiki-platform-solr/blob/master/solr/conf/schema.xml
> [2]
> http://solr.pl/en/2013/04/02/random-documents-from-result-set-giveaway-results/
> 
> Thanks everyone for the attention!
> 
> Danilo
> 
> 
> 
> 2014-08-26 4:33 GMT-03:00 Clemens Klein-Robbenhaar <
> [email protected]>:
> 
>> 
>> This Query looks much like it is generated by the tag service when
>> searching
>> for documents with a given tag (the code is in class TagQueryUtils, method
>> getDocumentsWithTag, in the
>> xwiki-platform-core/xwiki-platform-tag/xwiki-platform-tag-api
>> module)
>> 
>> This query might be triggered by any kind of UI element (Panel, macro etc.)
>> I do not think it is used to update any search index or the like.
>> Instead it is used on some pages, e.g. Main.Tags when clicking on a tag to
>> see its list
>> of documents.
>> 
>> I wonder why this query takes so long. Even a 100K docs should not be
>> that much
>> (I mean, 5 minutes query time, huh?)  Is there any chance some binary data
>> of the movie
>> objects or the like ended up in the xwikilistitems table or any other
>> table used in the query?
>> 
>> Clemens
>> 
>>> Hello,
>>> 
>>> As I mentioned, I discovered that the queries that are hogging my DB are
>>> similar to:
>>> '102', 'xwiki', 'localhost:52614', 'xwiki', 'Query', '372', 'Creating
>> sort
>>> index', 'select xwikidocum0_.XWD_FULLNAME as col_0_0_ from xwikidoc
>>> xwikidocum0_ cross join xwikiobjects baseobject1_ cross join xwikilists
>>> dbstringli2_ inner join xwikiproperties dbstringli2_1_ on
>>> dbstringli2_.XWL_ID=dbstringli2_1_.XWP_ID and
>>> dbstringli2_.XWL_NAME=dbstringli2_1_.XWP_NAME inner join xwikilistitems
>>> list3_ on dbstringli2_.XWL_ID=list3_.XWL_ID and
>>> dbstringli2_.XWL_NAME=list3_.XWL_NAME where (xwikidocum0_.XWD_HIDDEN<>1
>> oy
>>> xwikidocum0_.XWD_HIDDEN is null) and
>>> baseobject1_.XWO_CLASSNAME=\'XWiki.TagClass\' and
>>> baseobject1_.XWO_NAME=xwikidocum0_.XWD_FULLNAME and
>>> baseobject1_.XWO_ID=dbstringli2_.XWL_ID and
>> dbstringli2_.XWL_NAME=\'tags\'
>>> and lower(list3_.XWL_VALUE)=lower(\'shock-rock\') order by
>>> xwikidocum0_.XWD_FULLNAME'
>>> 
>>> Anyone knows what is the component that is responsible for this query?
>> for
>>> each new tag, this kind of query is executed to create sort index?
>>> 
>>> Thanks
>>> 
>>> 
>>> 2014-08-23 3:46 GMT-03:00 O.J. Sousa Rodrigues <[email protected]>:
>>> 
>>>> Wouldn't this be a perfect case for a NoSQL-DB like Neo4J?
>>>> Am 22.08.2014 23:13 schrieb "Paul Libbrecht" <[email protected]>:
>>>> 
>>>>> Danilo,
>>>>> 
>>>>> have you checked the MySQL process list?
>>>>> I'd suspect something is hogging.
>>>>> For search, I'd recommend to leverage solr… but with an amount of
>>>>> customizations. There are some hooks in the solr-plugin, I believe.
>>>>> 
>>>>> hope it helps.
>>>>> 
>>>>> paul
>>>>> 
>>>>> 
>>>>> On 22 août 2014, at 22:54, Danilo Oliveira <[email protected]
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hello Devs,
>>>>>> 
>>>>>> I am developing an application based on XWiki that is mapping,
>>>>> connecting,
>>>>>> relating and graphical disposing movie information in order to make
>>>>>> possible to the user explore their trailers.
>>>>>> 
>>>>>> At the beginning with a light data set (<5k movies) the application
>> was
>>>>>> running well, but today I started to populate my database (MYSQL) and
>>>> the
>>>>>> application became unusable, the queries is taking more than 5 minutes
>>>> to
>>>>>> complete. Actually, it has more than 15k movies (1 movie = 1 doc) and
>> I
>>>>>> need to upload more 100k.
>>>>>> 
>>>>>> I already have checked the cache and performance page but I don't know
>>>> if
>>>>>> they[1][2] solve my problem:
>>>>>> I think that is a architecture challenge.
>>>>>> 
>>>>>> My AS IS process is:
>>>>>> -User insert a movie,
>>>>>> -the application search for the movie and their related films based on
>>>>> its
>>>>>> characteristics (a lot of joins and other algorithms) (bottleneck)
>>>>>> -the application returns the results as a map;
>>>>>> 
>>>>>> I am wondering if I could use the custom mapping[3] to solve my
>> problem
>>>>> due
>>>>>> the fact that the relationship information for each movie, in this
>>>> first
>>>>>> moment, don't need to change often. Each movie has X movies related,
>>>>> sorted
>>>>>> by similarity. So, I could create some relationship algorithm that
>> will
>>>>> run
>>>>>> scheduled ( 1 time by week) and populate this new table .I am thinking
>>>> to
>>>>>> use dataframe panda of python to talk directlly with mysql and make
>>>> data
>>>>>> analysis, any other suggestion?
>>>>>> 
>>>>>> So I would create a custom map to my relationship movie class, run the
>>>>>> algorithm, populate the new table, so my TO BE would be:
>>>>>> 
>>>>>> TO BE
>>>>>> -user insert movie info;
>>>>>> -simple select on the customtable "MoviesRelated";
>>>>>> -the application returns the results;
>>>>>> 
>>>>>> I  would appreciate some opinion. Thank you very much.
>>>>>> 
>>>>>> [1]http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Performances
>>>>>> [2]http://extensions.xwiki.org/xwiki/bin/view/Extension/Cache+Module
>>>>>> [3]http://platform.xwiki.org/xwiki/bin/view/DevGuide/CustomMapping
>>>>>> 
>>>>>> Danilo
>>>>>> --
>>>>>> Danilo Amaral de Oliveira
>>>>>> Engenheiro de Computação
>>>>>> celular (32) 9111 - 6867
>>>>>> _______________________________________________
>>>>>> devs mailing list
>>>>>> [email protected]
>>>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>>> 
>>>>> _______________________________________________
>>>>> devs mailing list
>>>>> [email protected]
>>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>>> 
>>>> _______________________________________________
>>>> devs mailing list
>>>> [email protected]
>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> mit freundlichen Grüßen
>> Clemens Klein-Robbenhaar
>> 
>> --
>> Clemens Klein-Robbenhaar
>> Software Development
>> EsPresto AG
>> Breite Str. 30-31
>> 10178 Berlin/Germany
>> Tel: +49.(0)30.90 226.763
>> Fax: +49.(0)30.90 226.760
>> [email protected]
>> 
>> HRB 77554 B - Berlin-Charlottenburg
>> Vorstand: Maya Biersack, Peter Biersack
>> Vorsitzender des Aufsichtsrats: Dipl.-Wirtsch.-Ing. Winfried Weber
>> Zertifiziert nach ISO 9001:2008
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>> 
> 
> 
> 
> -- 
> Danilo Amaral de Oliveira
> Engenheiro de Computação
> celular (32) 9111 - 6867
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] XWiki Scalability/Challenge Question

Reply via email to