On 9 Nov 2010, at 13:11, Jos Snellings wrote:

> You are right, Ian,
> 
> This question deserves a new thread.
> Currently I am drawing up an architecture for a file handling system for 
> e-government:
> permissions are scattered up to:
> - the citizen : one active file for a citizen (= folder, infoholder in xml, 
> attachments)
> - the community :  visibility and handling for the citizens of one community
> - the regional authority : regional indicators
> 
> This worries me for it is a typical case where you would run into scalability 
> problems.
> Think of 50 000 open applications via that system. With 10 documents per 
> application
> you would have 500 000.

If 1 user only has access to 10 applications, then doing a search that finds 
500,000 applications only to return 10 readable ones would not scale, just as a 
table scan on a RDBMS table containing .5M rows with no index would also not 
scale.



> 
> Is that a nogo for Sling? Would be a pity. I wanted to come up with an 
> elegant solution :-)


Sling is not the issue here, its Jackrabbit, and knowing that the above 
situation does not scale you would do 2 things.
Never use that type of search.

Access all data via pointers and paths into the data based on something that 
was not a search. eg if the application was 2919100291
you might find the application and all the information in 
/applications/29/19/10/2919100291

and if the user had an ID of e31231231432
they might have a folder 
/users/e3/12/31/23/1432
     with a sub folder 
         2919100291 

containing a property
              egov:application-path : /applications/29/19/10/2919100291



ie you have to model your data to avoid searches and non direct access pathways,

but......

Please 
ask on [email protected] as the committers there will be able to give you a 
complete and honest answer to if Jackrabbit is a No Go.
and
do some tests to prove to yourself that it will work at the scale that you want.

(bash + curl + sling is a good way of doing these sort of tests)



> 
> Jos
> 
> 
> 
> 
> On 11/09/2010 09:22 AM, Ian Boston wrote:
>> Jos,
>> If by result you mean a search result, then thats a separate issue from the 
>> dynamic ACL itself, and not the direct subject of this thread. When I said 
>> performance I was referring to the atomic act of determining if the ACE was 
>> active for any attempt to access an item, not just search results.
>> 
>> 
>> However,
>> thats the way jackrabbit works.
>> JCR searches are "compiled" into Lucene Queries that generate Lucene Hits 
>> where the Lucene document contains a node ID, which is extracted in the 
>> normal manner from JCR (IIRC). If the current user cant read the item, its 
>> discarded.
>> 
>> This is fine for dense searches where most items can be read by the user, 
>> but problematic for sparse searches.
>> Its also problematic for sorts that can't be performed inside Lucene, as 
>> this results in all the items being loaded into memory before searching.
>> One way to avoid sorts of this form is to ban "order by" clauses that 
>> reference any items other than properties of the node found.
>> 
>> 
>> BTW, problematic == non scalable, vertically or horizontally.
>> Ian
>>   
> 

Reply via email to