Re: About the missing COUNT() feature in jackrabbit

Bart van der Schans Wed, 04 Apr 2012 02:08:58 -0700

Hi,

The problem with providing a COUNT feature is authorization. The get
to a correct count you need to fetch all the nodes from the PM and
authorize them one-by-one. This works fine for a few hundred nodes,
but will take forever when you scale up.For example "SELECT COUNT(*)
FROM nt:base" would directly translate to fetch every single bundle
from the persistence layer (database) and do that with a single query
for every node and then authorize each of those nodes.


The only way I can see a well performing count feature is when the
authorization tokens are indexed as well, aka running authorized
queries. For JR 2 this is probably not an option.

IMO providing a count feature in the query language would give the
wrong impression to developers. It would look like something you can
"just use", but in fact it will not work once you start putting
content in your production repository.

Regard,
Bart

On Wed, Apr 4, 2012 at 8:35 AM, Christian Stocker
<[email protected]> wrote:
> Hi
>
> Any opinions on this? I'm mainly wondering if there's a chance that
> something like this goes into jackrabbit or if there's opposition to
> that. Not the actual implementation, just the idea of it. I'll work more
> on it, if you think that could be something for jackrabbit-jcr-server
> and propose something more decent.
>
> If not, I'll pursue other challenges ;)
>
> Greetings
>
> chregu
>
> On 31.03.12 08:18, Christian Stocker wrote:
>> Hi
>>
>> As we all know, there's no COUNT feature in SQL2. This is not that bad,
>> if you're not trying to get a count via remoting, since you can get it
>> with .getSize() locally.  But if you're working via the webdav layer,
>> it's not really efficient right now, especially if your resultsets are
>> larger.
>>
>> There are in my opinion 2 opinions to solve this.
>>
>> 1) add a count() feature to the SQL2 parser, there's already a ticket
>> for that (among other things)
>> https://issues.apache.org/jira/browse/JCR-2605
>> That of course would be a great solutions, it solves the problem for all
>> cases who need that
>>
>> 2) add an option to the SEARCH webdav interface to get the count if
>> results instead of the whole result set.
>>
>> For 2) I made a little prototype (which is really just that ;)) to see
>> if that would be feasible. That patch can be found here
>>
>> https://github.com/chregu/jackrabbit/compare/count-in-webdav
>>
>> It adds a D:count option to a SEARCH query like this
>>
>> <D:searchrequest xmlns:D="DAV:">
>>       <JCR-SQL2>
>>               <![CDATA[SELECT * FROM [nt:unstructured] ]]>
>>       </JCR-SQL2>
>>       <D:count>1</D:count>
>> </D:searchrequest>
>>
>> and if that is set it returns only the count of results. Currently like
>> below, but that format has certainly to be changed (it was just the
>> shortest way to get something back):
>>
>> <D:multistatus xmlns:D="DAV:">
>>       <D:response>
>>               <D:href>#count</D:href>
>>               <D:responsedescription>433</D:responsedescription>
>>       </D:response>
>> <D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescription>
>> </D:multistatus>
>>
>> My questions:
>>
>> - Is there any chance that something like this goes into jackrabbit and
>> we can build on that feature?
>> - Or is there a better way to just get the count of a query?
>> - How should the response look like for something like this? The above
>> is certainly not right ;)
>> - And last but not least, are there any plans to implement COUNT() some
>> time in the future directly in SQL2?
>>
>> Thanks for any feedback.
>>
>> chregu
>>
>>
>
> --
> Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
> Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
> www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: About the missing COUNT() feature in jackrabbit

Reply via email to