Thank you, Ian !
I am writing the proposal as a warned subject.

Jos

On 11/10/2010 10:05 AM, Ian Boston wrote:
On 10 Nov 2010, at 00:09, Jos Snellings wrote:


Thank you for your prompt answer, Ian.
You mean "the natural way".
That would be true for a citizen.
That would be true for a community, so a path could be Stockholm/234987488.
But to extract a regional indicator, like 'how many applications were handled 
on time during the first half of 2014'. This is something that is not requested 
in the first place,
but I know it *will* come up.  ==>  then the user performing this query would 
have read access on all files. Would the query scale better?


'how many applications were handled on time during the first half of 2014'

implies a date range.
IIRC date ranges are problematic in Lucene and although the query might be Ok 
from a sparse search point of view, the date range might cause a problem. Again 
experimentation before committing to implementation is going to remove more of 
the risk.
Ian



Thanks,
Jos

On 11/09/2010 07:56 PM, Ian Boston wrote:

On 9 Nov 2010, at 13:11, Jos Snellings wrote:



You are right, Ian,

This question deserves a new thread.
Currently I am drawing up an architecture for a file handling system for 
e-government:
permissions are scattered up to:
- the citizen : one active file for a citizen (= folder, infoholder in xml, 
attachments)
- the community :  visibility and handling for the citizens of one community
- the regional authority : regional indicators

This worries me for it is a typical case where you would run into scalability 
problems.
Think of 50 000 open applications via that system. With 10 documents per 
application
you would have 500 000.


If 1 user only has access to 10 applications, then doing a search that finds 
500,000 applications only to return 10 readable ones would not scale, just as a 
table scan on a RDBMS table containing .5M rows with no index would also not 
scale.





Is that a nogo for Sling? Would be a pity. I wanted to come up with an elegant 
solution :-)


Sling is not the issue here, its Jackrabbit, and knowing that the above 
situation does not scale you would do 2 things.
Never use that type of search.

Access all data via pointers and paths into the data based on something that 
was not a search. eg if the application was 2919100291
you might find the application and all the information in
/applications/29/19/10/2919100291

and if the user had an ID of e31231231432
they might have a folder
/users/e3/12/31/23/1432
      with a sub folder
          2919100291

containing a property
               egov:application-path : /applications/29/19/10/2919100291



ie you have to model your data to avoid searches and non direct access pathways,

but......

Please
ask on [email protected] as the committers there will be able to give you a 
complete and honest answer to if Jackrabbit is a No Go.
and
do some tests to prove to yourself that it will work at the scale that you want.

(bash + curl + sling is a good way of doing these sort of tests)





Jos




On 11/09/2010 09:22 AM, Ian Boston wrote:


Jos,
If by result you mean a search result, then thats a separate issue from the 
dynamic ACL itself, and not the direct subject of this thread. When I said 
performance I was referring to the atomic act of determining if the ACE was 
active for any attempt to access an item, not just search results.


However,
thats the way jackrabbit works.
JCR searches are "compiled" into Lucene Queries that generate Lucene Hits where 
the Lucene document contains a node ID, which is extracted in the normal manner from JCR 
(IIRC). If the current user cant read the item, its discarded.

This is fine for dense searches where most items can be read by the user, but 
problematic for sparse searches.
Its also problematic for sorts that can't be performed inside Lucene, as this 
results in all the items being loaded into memory before searching.
One way to avoid sorts of this form is to ban "order by" clauses that reference 
any items other than properties of the node found.


BTW, problematic == non scalable, vertically or horizontally.
Ian











Reply via email to