Re: [basex-talk] Full-Text

Ветошкин Владимир Thu, 19 Jul 2018 00:12:01 -0700

Hi, Alexander!

I have written this task.

I have about 50 databases. If I search in each one database separately - it takes about 5-10ms per base.

But if I search in "db:list...db:open..." - it takes about 12-15 seconds.

Example takes ~12-15s:

let $db := for $i in db:list()[starts-with(.,'000999~')] return try {db:open($i)} catch * {}

for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]

return $doc

Example takes ~180ms (returns 2 rows):

let $db := for $i in db:list()[starts-with(.,'000999~201807')] return db:open($i)

for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]

return $doc

Example takes ~10ms (returns 2 rows):

for $doc in db:open('000999~201807')/.//*[text() contains text { 'TEN-9258' } any]

return $doc

Why do the last 2 examples take different times?

How can I improve this?

Example takes ~2s (returns 0 rows):

let $db := for $i in db:list()[starts-with(.,'000999~201806')] return db:open($i)

for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]

return $doc

Example takes ~12ms (returns 0 rows):

for $doc in db:open('000999~201806')/.//*[text() contains text { 'TEN-9258' } any]

return $doc

25.06.2018, 13:07, "Alexander Shpack" <shadow...@gmail.com>:

Hi, Vladimir,

If you will do db names with the particular prefix, for example "db_", you may use the next code

let $docs := for $i in db:list()[starts-with(.,"db_")] return db:open($i)return $docs/*

On Mon, Jun 25, 2018 at 12:32 PM Ветошкин Владимир <en-tra...@yandex.ru> wrote:
Hi, Alexander,

Some questions:
After that, how can I perform a search in all of these databases?
Can I search for substring without fulltext using only text index?

25.06.2018, 11:56, "Alexander Shpack" <shadow...@gmail.com>:
Hey Vladimir,

You can use sharding approach for you data import and split all DBs even every month.

On Mon, Jun 25, 2018 at 11:50 AM Ветошкин Владимир <en-tra...@yandex.ru> wrote:
Hi, Alexander!
Thank you!

In my previous letter I have described the proccess in short.
I'll think about separated DB. But I'm afraid that this base will also be very big in future.
Although I can try to split data to several databases - one per year.. Hmm..

25.06.2018, 11:25, "Alexander Shpack" <shadow...@gmail.com>:
Hey, Vladimir!

Just put this specific files to the separated DB and than index it.
You can process it automatically, BaseX allows to create and index DB right from XQuery.

I hope it helps you. Anyhow, you can provide more details about your task and we can figure out the best solution for you.

On Mon, Jun 25, 2018 at 10:42 AM Ветошкин Владимир <en-tra...@yandex.ru> wrote:
Hi, Fabrice!
Thank you.

All databases constantly change.That is why there is no way to single out "a big readonly collection" :(
Maybe it is possible to use some other incremental indexes?
I have to index specific xml-files, not all files in database.

21.06.2018, 17:16, "Fabrice ETANCHAUD" <fetanch...@pch.cerfrance.fr>:
Hi Vladimir,

I don’t think there is something like a incremental full text index for the moment [1].
As index is per collection, the recommanded way shall be to split your data in two collections :
- A big readonly collection of all the past updates, indexed once
- A small/medium sized collection whom full text index can be recreated in an acceptable time after each update.
At the end of a predefined time period, you have to add the live collection to the readonly one, reindex it, and truncate the live one.

Best regards from France,
Fabrice Etanchaud

[1] http://docs.basex.org/wiki/Indexes#Updates

De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de ???????? ????????
Envoyé : jeudi 21 juin 2018 16:02
À : BaseX
Objet : [basex-talk] Full-Text

Hi, everyone!

Is there any way to index only imported xml-files?
Now, when I import xml-files the full-text index is deleted.
After importing I recreate whole full-text index and it takes too much time :(

--
С уважением,
Ветошкин Владимир Владимирович

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

С уважением,

Ветошкин Владимир Владимирович

Re: [basex-talk] Full-Text

Reply via email to