Re: About JENA-2339 - security related

Vilnis Termanis Fri, 26 Aug 2022 00:40:03 -0700

(Apologies for the delay - I've been busy at work with other stuff)

On Mon, 15 Aug 2022 at 14:27, Andy Seaborne <a...@apache.org> wrote:
>
> There is one Jena user - it's Vilnis (for Iotics).
>
> Your use cases - whatever they are - are for the current product and
> will evolve. Whether the way you propose will support the evolution of
> the use cases in the future, say the next 5 years, is unclear (and I
> think quite unlikely both on security features because product feature
> evolve, and on wanting to working with spatial or text datasets).  Jena
> tries to give stability.
>
> The essence of the PR is ~30 lines in SecurityContextDynamic.
> The rest is rearranging the plumbing to have a magic user.
> This does not need DatasetGraphAccessControl.


You're right. I've come to the same conclusion. (I extended ACL to try
the approach because it was the quickest way to do so, at least so it
seemed when I started.)

>
> This could be in a custom query processor extending SPARQL_QueryDataset
>   overriding decideDataset delivered as Fuseki Module. (You can override
> the standard query processor (1 line of code) if you want all query
> services to have this, or all for a particular service (2 lines of
> code), or be a new endpoint that offers only SPARQL query over a view of
> the dataset. The latter is better for you because you can put API
> security on the endpoint. It's a opt-in, drop-in extension, to a
> standard distribution Fuseki/Main.
>
> The amount of code reuse from SecurityContextView is 20 lines maybe via
> SecurityContextView.filterTDB and the functionality could made into a
> function.
>
> Now your usage is not a security issue for the Fuseki server as the HTTP
> request interface is not changed. No interaction with GSP.
>
> So Iotics add their own query processor to a standard Fuseki server and
> can evolve the extension. Configuring the network for the extension is
> the responsibility of the Iotics deployment.
>

Noted. I'll have a go at that approach. In fact, I think I'll also try
(as an option) what both you & Martynas suggested: Rather than supply
a completely external ACL list, allow for specifying of a query to
determine the set of visible graphs (e.g. using WACL/Solid).

> The extension might even be interesting to other Jena users not as
> security feature but as for the dynamic view capability.
>
>  >>> using a "SELECT {} 1" query, and
>  >>> adding a certain set of graphs makes the queries on my laptop take:
>  >>> ~600 graphs ~115ms
>  >>> ~1500 graphs ~162ms
>  >>> ~3k graphs ~240ms
>  >>> ~6k graphs ~400ms
>  >>
>  >> That's an illustration of the current system but we don't know what
>  >> is the cause of the cost.
>  >>
>  >> What piece of the code is taking the time?
>  >> Maybe the right thing to do is make it faster.
>  >
>  > I haven't looked into this in great detail, but from my understanding
>  > the time taken is a combination of a) parsing the input of allowed
>  > graphs and b) generating a new SecurityContext (holding a hashmap of
>  > said graphs). If providing a set of allowed graphs in the proposed way
>  > is not a no-go, I'm happy to dig into where the cost is exactly.
>
> We haven't seen the queries you were making. It is difficult to believe
> that Java takes >100ms to build a 6K entry hash map.
>

Yes, that definitely doesn't sound quite right for a "SELECT {} 1"
query with a set of ~6k graph URIs each of ~48 ASCII chars. (It could
well be that the regex for parsing it is the issue and that wouldn't
be required anyway with a URL or form param. I'll look into it.)

> You mentioned that request line gets too long. True for GET but a SPARQL
> query request could be sent as a HTML form
> (application/x-www-form-urlencoded) so listing the graph using
> ?default-graph-uri=/?named-graph-uri= can be much larger than the
> practical GET limits.

That's a good point - I had forgotten about that! (We've been using
direct-POST all this time.)
I suppose it would allow for both GET with smallish lists and
form-encoded POST for larger ones.

Are there any (dis)advantages for POST by-form versus POST directly?
(I guess in the end, whether some arguments are in the URL path or
encoded in the body, doesn't really matter. Apart from that the SPARQL
query has to be url-decoded, unlike with direct-POST, I suppose.)

Anyway, thank you for all of your (and Martynas') input - much appreciated.

>
>      Andy
>
>


-- 
Vilnis Termanis
Technical Specialist

e | vilnis.terma...@iotics.com
www.iotics.com

Re: About JENA-2339 - security related

Reply via email to