Re: full text search improvements

Jukka Zitting Mon, 26 Mar 2012 05:12:50 -0700

Hi,

On Sat, Mar 24, 2012 at 3:12 PM, Lukas Kahwe Smith <[email protected]> wrote:
> More over (optionally) leveraging these has several other advantages:


Agreed, I think it would be great to have first-class integration from
Oak to *both* Solr and ElasticSearch. As soon as we have a first draft
to the indexing extension points (as described in the other thread)
I'd love to see some prototypes on how they'd work in terms of
external search indexes. Volunteers for that?

> Now I mentioned facetting [4] above. Right now Jackrabbit does not even
> support COUNT() [5], which I find very painful and a major oversight. But
> really what people have come to expect from full text search is facetting.

Totally agreed. We need to make sure that the Oak query API supports
faceting and other related query features. The actual implementation
can (should?) be left to individual index components.

> 3) "cleaner" data in results

This goes into the discussion of what the query result abstraction in
the Oak API should look like. Breaking the requirement that all query
results be directly linked to nodes in the repository should go a long
way here, but it also opens up the issue of how such results relate
with access controls. We need to put some thought into this...

> 4) cover more SQL2 functions
>
> This is a comparatively minor topic and might just be beyond the scope
> of this mailinglist which seems to be more about designing the future
> architecture than "minor" feature requrts. But it would be great to also
> support PATH(), DEPTH() etc. [8].

Agreed. That's one of the main reasons why I think we shouldn't just
reuse the JQOM from JCR 2.0 as the internal query model. Having an
easy way for custom functions to be added, ideally as pluggable
extensions, is IMHO a big part in future-proofing the architecture.
Examples of where this would come in handy are features like querying
by geographical location, image similarity, or graph distance (think
social networks).

> My point being here, when thinking about Oak, please also think about
> the performance of users talking to Jackrabbit via HTTP.

+1 I think we should start something like oak-jsop or oak-webdav (or
oak-atom) that provides a native mapping of the Oak API to a
HTTP-based access protocol. The current WebDAV(ex) mapping in
Jackrabbit 2.x is (as you've seen) a bit limited by all the JCR and
SPI layering in between.

> The PHPCR team has done its best in trying to solve quite a few
> performance issues with the current HTTP API, but it would be great
> of this was really in everyones head.

Agreed. It would be great also to get your feedback on the protocol
bits as soon as we have something runnable. The rough roadmap I came
up with earlier [1] suggests that we should have basic HTTP-based CRUD
operations working in the 0.2 release scheduled for April.

[1] http://markmail.org/message/7dhxklytr2xaoe24

BR,

Jukka Zitting

Re: full text search improvements

Reply via email to