Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Andrey Mashenkov Thu, 29 Aug 2019 13:29:13 -0700

Hi Yuriy,

Unfortunatelly, there is a plan to discontinue TextQueries in Ignite [1].
Motivation here is text indexes are not persistent, not transactional and
can't be user together with SQL or inside SQL.
and there is a lack of interest from community side.
You are weclome to take on these issues and make TextQueries great.

1,  PageSize can't be used to limit resultset.
Query results return from data node to client-side cursor in page-by-page
manner and
this parameter is designed control page size. It is supposed query executes
lazily on server side and
it is not excepted full resultset be loaded to memory on server side at
once, but by pages.
Do you mean you found Lucene load entire resultset into memory before first
page is sent to client?

I'd think a new parameter should be added to limit result. The best
solution is to use query language commands for this, e.g. "LIMIT/OFFSET" in
SQL.

This task doesn't look trivial. Query is distributed operation and same
user query will be executed on data nodes
and then results from all nodes should be correcly merged before being
returned via client-cursor.
So, LIMIT should be applied on every node and then on merge phase.

Also, this may be non-obviuos, limiting results make no sence without
sorting,
as there is no guarantee every next query run will return same data because
of page reordeing.
Basically, merge phase receive results from data nodes asynchronously and
messages from different nodes can't be ordered.

2.
a. "tokenize" param name (for @QueryTextFiled) looks more verbose, isn't
it.
b,c. What about distributed query? How partial results from nodes will be
merged?
 Does Lucene allows to configure comparator for data sorting?
What comparator Ignite should choose to sort result on merge phase?

3. For now Lucene engine is not configurable at all. E.g. it is impossible
to configure Tokenizer.
I'd think about possible ways to configure engine at first and only then go
further to discuss\implement complex features,
that may depends on engine config.

On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga <[email protected]> wrote:

> Dear community,
>
> By starting this chain I'd like to open discussion that would come to
> contribution results in subj. area.
>
> Ignite has indexing capabilities, backed up by different mechanisms,
> including Lucene.
>
> Currently, Lucene 7.5.0 is used (past year release).
> This is a wide spread and mature technology that covers text search area
> and beyond (e.g. spacial data indexing).
>
> My goal is to *expose more Lucene functionality to Ignite indexing and
> query mechanisms for text data*.
>
> It's quite simple request at current stage. It is coming from our project's
> needs, but i believe, will be useful for a lot more people.
> Let's walk through and vote or discuss about Jira tickets for them.
>
> 1.[trivial] Use  dataQuery.getPageSize()  to limit search response items
> inside GridLuceneIndex.query(). Currently it is calling
> IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all scored
> matches will me returned, what we do not need in most cases.
>
> 2.[simple] Add sorting.  Then more capable search call can be
> executed: *IndexSearcher.search(query, count,
> sort) *
> Implementation steps:
> a) Introduce boolean *sortField* parameter in *@QueryTextFiled *
> annotation. If
> *true *the filed will be indexed but not tokenized. Number types are
> preferred here.
> b) Add *sort* collection to *TextQuery* constructor. It should define
> desired sort fields used for querying.
> c) Implement Lucene sort usage in GridLuceneIndex.query().
>
> 3.[moderate] Build complex queries with *TextQuery*, including
> terms/queries boosting.
> *This section for voting only, as requires more detailed work. Should be
> extended if community is interested in it.*
>
> Looking forward to your comments!
>
> BR,
> Yuriy Shuliha
>

-- 
Best regards,
Andrey V. Mashenkov

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Reply via email to