Hello all

 

I have been reading through many emails on this list and I've learnt a lot
about how Basex works and how others use it. A month or so back I have sent
an email myself to this list concerning caching. Even though I have some
more questions about that, I will leave that for another time. Today I am
concerned about retrieving chunked input from Basex.

 

(Question also found on StackOverflow, with a nice bounty! :)
http://stackoverflow.com/questions/36675388/efficient-and-user-friendly-way-
to-present-slow-loading-results)

 

Case at hand: we use Basex to query a 50 million tokens corpus. We also make
this available to other users through a website. The thing is that this is
slow. For our own projects that's no problem, we dive straight into the
back-end and run a search command from terminal and let the query run for
all the time it needs. However, for users it is paramount that they get a
quick response. At the moment it is taking too long. I don't blame BaseX. We
love BaseX and are astounded by its efficiency and optimisations! However,
we want to deliver the best user-experience to our users.

 

We call a new session from PHP, wait to receive the results, do some
post-processing and then load the result page. As said, this takes too much
time. We've been looking into some solutions. The best one that I think
should be possible, is returning chunks of the results. Do you know those
websites that allow you to see results but only, like, 20 per page? I think
something similar is appropriate. When a user has searched for a pattern, we
only show the 20 or so first results just so they can get an idea of the
results they'd find. Then, when they click a button, we should query for the
twenty next results which are then appended to the list (JavaScript solution
I guess), and so on. Until all results have been found. Additionally, I will
also provide a button from which users can download all results in a text
file. This is allowed to take a longer time. The main thing is that users
should get early feedback and results on their query.

 

Now the question is if something like this is possible in an efficient
manner in BaseX. Can you form a query that only finds the 20 first results,
and then the following 20 and so on - and is this faster than searching
everything at once? In other words, when I am searching for the results
120-140 (after having pushed the button a couple of times), is BaseX smart
enough to skip the search space it has already done to find the 120 previous
hits? If so, that would be great. Could you help me on my way, with some
PHP/XQuery code that is suitable?

 

I also highly encourage you to participate on StackOverflow. As I said, I am
offering a 200 bounty - for the people who are interested in Internet fame.
:)

 

 

Thank you for your time

 

Kind regards

 

Bram Vanroy

http://bramvanroy.be

Reply via email to