Hi there Christian

Thank you for the extensive answer! 

In the meanwhile, I have solved the issue I had that caused simultaneous 
queries not to fire asynchronously. The problem was a locked PHP $_SESSION 
variable. Not related to Basex. My bad!

I am going to test the caching difference again soon, on a larger subcorpus. 
I'm curious to find out the results! I will incorporate results in a thesis. If 
you're interested, I can definitely share the results with you when I have 
finished writing them down!

Finally, I'd like to thank you for providing these answers. Never expected such 
good feedback and response. It really means a lot! Thank you!


Kind regards

Bram

-----Oorspronkelijk bericht-----
Van: Christian Grün [mailto:christian.gr...@gmail.com] 
Verzonden: zondag 22 mei 2016 13:29
Aan: Bram Vanroy | KU Leuven <bram.vanr...@student.kuleuven.be>
CC: BaseX <basex-talk@mailman.uni-konstanz.de>
Onderwerp: Re: [basex-talk] Querying Basex from a web interface

Hi Bram,

Thanks for your reply. It’s long indeed, so sorry in advance if I didn’t 
capture all relevant info…

> The approach explained above also implies that we had to create a lot of 
> BaseX databases. A lot. Around 10 million of them.

Impressive :)


>         •       Would a query that returns single results really be faster 
> than one that returns 10 results?
> Yes. In a search space of 500 million tokens, you can imagine that a rare 
> pattern may take a lot of time to query – even in the GrInded version.

I see. So I assume there won’t be many chances to speed up this scenario by 
working on index structures, as most time is spent for sequentially browsing 
all the databases, right?

>         •       Do you sort your search results? If yes, returning 100 
> results instead of 10 should not be much slower.
> As I am not entirely sure what you mean by that, I don’t think we do. By 
> sorting, do you mean the XQuery order by function?

Exactly. I also assume it shouldn’t play a role in your scenario.


> wouldn’t that mean that BaseX’ cache is cleared more often? I could imagine 
> that the garbage collector passes by after a query, or at least a session, is 
> closed? Have you any idea how this is possible?

Phew, a difficult one… I would need to spend some real time with your framework 
to give a solid answer.

> My two questions are: is count() actually faster than getting all results?

Yes, it will always be faster; but “faster” can mean 1% or 1000%… It will be 
much faster if the database statistics can be utilized to answer your query 
(which is probably not the case in your scenario), or if the step of retrieving 
the data, and/or returning it via the network consumes too much time. If you 
only count nodes, there is no need to retrieve all database contents from disk 
(node properties, textual data) that will be returned in the XML representation.

> Or does count() get all the hits any way, and should I count and get all 
> results in one step?

As you already indicated that the last result may occur much later than the 
first result in your database(s), I assume you won’t win that much. But for 
testing, you can wrap your query with count() to see what would be the minimum 
time to find all hits.

> Secondly, it seems that when the last step is initialised, the other 
> processes hang – leaving the user without any feedback. The processes 
> literally seem to stop running. My question then is: does this happen because 
> BaseX does not handle different sessions asynchronously, and new queries 
> block others?

By default, 8 queries can be run in parallel [1]. If your other queries are 
delayed a lot, it may be that the random disk access pattern causes by parallel 
queries outweigh the advantage of allowing parallel requests. But, in the first 
place, I would also assume that it’s worth checking your PHP environment first.

> Finally, I simply want to ask what the best flow is for opening and closing 
> BaseX sessions, and when one should open a new session.

With the light-weight PHP client, it’s usually best to open a new session for 
each request, and close it directly after your command or query has been 
evaluated. As usual, you should ensure that every session will be closed, even 
if an error occurs.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

Reply via email to