On 9 Nov 2010, at 13:11, Jos Snellings wrote:
> You are right, Ian,
>
> This question deserves a new thread.
> Currently I am drawing up an architecture for a file handling system for
> e-government:
> permissions are scattered up to:
> - the citizen : one active file for a citizen (= folder, infoholder in xml,
> attachments)
> - the community : visibility and handling for the citizens of one community
> - the regional authority : regional indicators
>
> This worries me for it is a typical case where you would run into scalability
> problems.
> Think of 50 000 open applications via that system. With 10 documents per
> application
> you would have 500 000.
If 1 user only has access to 10 applications, then doing a search that finds
500,000 applications only to return 10 readable ones would not scale, just as a
table scan on a RDBMS table containing .5M rows with no index would also not
scale.
>
> Is that a nogo for Sling? Would be a pity. I wanted to come up with an
> elegant solution :-)
Sling is not the issue here, its Jackrabbit, and knowing that the above
situation does not scale you would do 2 things.
Never use that type of search.
Access all data via pointers and paths into the data based on something that
was not a search. eg if the application was 2919100291
you might find the application and all the information in
/applications/29/19/10/2919100291
and if the user had an ID of e31231231432
they might have a folder
/users/e3/12/31/23/1432
with a sub folder
2919100291
containing a property
egov:application-path : /applications/29/19/10/2919100291
ie you have to model your data to avoid searches and non direct access pathways,
but......
Please
ask on [email protected] as the committers there will be able to give you a
complete and honest answer to if Jackrabbit is a No Go.
and
do some tests to prove to yourself that it will work at the scale that you want.
(bash + curl + sling is a good way of doing these sort of tests)
>
> Jos
>
>
>
>
> On 11/09/2010 09:22 AM, Ian Boston wrote:
>> Jos,
>> If by result you mean a search result, then thats a separate issue from the
>> dynamic ACL itself, and not the direct subject of this thread. When I said
>> performance I was referring to the atomic act of determining if the ACE was
>> active for any attempt to access an item, not just search results.
>>
>>
>> However,
>> thats the way jackrabbit works.
>> JCR searches are "compiled" into Lucene Queries that generate Lucene Hits
>> where the Lucene document contains a node ID, which is extracted in the
>> normal manner from JCR (IIRC). If the current user cant read the item, its
>> discarded.
>>
>> This is fine for dense searches where most items can be read by the user,
>> but problematic for sparse searches.
>> Its also problematic for sorts that can't be performed inside Lucene, as
>> this results in all the items being loaded into memory before searching.
>> One way to avoid sorts of this form is to ban "order by" clauses that
>> reference any items other than properties of the node found.
>>
>>
>> BTW, problematic == non scalable, vertically or horizontally.
>> Ian
>>
>