Re: About JENA-2339 - security related

Vilnis Termanis Mon, 08 Aug 2022 09:06:12 -0700

On Mon, 1 Aug 2022 at 12:29, Andy Seaborne <a...@apache.org> wrote:
>
>
>
> On 28/07/2022 20:50, Vilnis Termanis wrote:
> > Hi Andy & Jena development community,
> >
> > (Answers inline - apologies if I repeat myself)
> >
> > FYI - Our aim is to enable end-users to make SPARQL queries whilst
> > respecting visibility restrictions.
> > I.e. users (indirectly) add sets of related triples to a dataset and
> > they can choose who has visibility (beyond themselves) over these,
> > either: Nobody, Everyone or a chosen set (which can be updated). Note
> > that this restriction is not by a specific subject or predicate.
> > (Although the sets of triples do have relationships - not all of them
> > are known in advance.)
>
> Let's clarify terminology here.
>
> A "Jena user" is a person or organisation that is downloading Jena,
> either as the formal release (source code) or convenience binaries (e.g.
> jars from Maven Central). The "convenience binaries" is the more usual case.
>
> Not Iotics users. Systems built with Jena have their own users.
> (The Apache License applies - including clause 7.)
>
> The responsibility is between the downstream system builder and their
> users of product or service being "fit for purpose".


Sorry about that - I should have been clearer with the terms.

In the submission - there is only one entity - the "Fuseki user" (e.g.
via BasicAuth) to which the dynamic mode applies. However, since this
is intended to be used a part of an integration (by Jena users - to
gate access to their own domain-specific end-users), the
authentication bit I think is irrelevant. (E.g. a separate service
endpoint could have the proposed functionality enabled and this is
what the integration calls.)

>
> > using a "SELECT {} 1" query, and
> > adding a certain set of graphs makes the queries on my laptop take:
> > ~600 graphs ~115ms
> > ~1500 graphs ~162ms
> > ~3k graphs ~240ms
> > ~6k graphs ~400ms
>
> That's an illustration of the current system but we don't know what is
> the cause of the cost.
>
> What piece of the code is taking the time?
> Maybe the right thing to do is make it faster.

I haven't looked into this in great detail, but from my understanding
the time taken is a combination of a) parsing the input of allowed
graphs and b) generating a new SecurityContext (holding a hashmap of
said graphs). If providing a set of allowed graphs in the proposed way
is not a no-go, I'm happy to dig into where the cost is exactly.

>
> And in the general area - what are you using for authentication?
>

For us right now, we're only using fuseki:auth "basic" for the
purposes of differentiating different access levels against Fuseki
Data Access Control configuration (by mapping those to Fuseki users),
e.g.:
Fuseki user1 => allowed to see graphs A & B
Fuseki user2 => allowed to see graphs B & C
Fuseki user3 => has the proposed feature dynamic-access feature
enabled (i.e. no access unless the pragma preamble exists in query
with 1+ graphs defined)

Said Fuseki users (=roles) are then chosen based on what the system
needs to do (domain-specific).

> There is some bearer auth support in the next release ... it does not
> provide complete bearer auth because it can't cover all cases (e.g. JWT
> validation). It is more of a framework template with which to build a
> local solution.

I'm showing my lack of JWT/Bearer auth knowledge - but is this
building block for what Martynas suggested, namely the token implies
the user to which dynamic ACL applies and then access can be
restricted e.g. via WACL/Solid? (Correct me if I'm wrong but is  this
still not a solution that involves ACL rules being stored in Jena or
at least be accessible via SPARQL for a SERVICE call?)

>
> ----
>
> "FMod_ABAC" is not related to jena-permissions.
>
> "FMod_" means Fuseki Module.
> https://jena.apache.org/documentation/fuseki2/fuseki-modules
>     No forks.
> ABAC = Attribute Based Access Control.
>
> Using attributes separates ACLs from direct naming users for access to
> things. FMod_ABAC things are triples. Triples have "labels". Labels are
> attribute expressions, including AND and OR operators.
>
>      "employee | contractor" -- must have the "employee" attribute
>                                 or the "contractor" attribute.
>
>      "employee & dept=engineering" -- must have both "employee" and
>                                      "dept=engineering" attributes.
>
> There is a division of responsibilities. The data is labelled - so the
> data owner is responsible for the data attribute requirements. The
> assignment of attributes to users is separate.
>
> > FYI - In our case this means that we have a "make SPARQL query" API
> > call. When received, the applicable user (our domain) is known and, in
> > the proposed PR, we can prepend the set of allowed graphs to the query
> > (which have been looked up prior to query execution, externally). The
> > end user has NO direct access to Fuseki itself.
>
> You have a solution presuming a protected network, or possibly a
> container with in-container networking.
>
> That's my Concern 1. Security conditions outside Jena must be met.
> Having that, even if not in use, is an issue.
>

Maybe I misunderstand, but is this not in the same boat as:
a) Configuring a service which allows write access (but not gating who
can reach said service)
b) Configuring Fuseki access control in config and allowing 1+ graphs
(which shouldn't be included)
c) Configuring a service which allows read access to all graphs (i.e.
without Fuseki Graph ACL - again unintended)

.. in that it's up to the Jena User to set up their deployment in a
way that matches any security requirements.
(The proposed feature, as a separate extension or part of Fuseki Graph
ACL would have to be explicitly configured/enabled.)

> >> Concern 1:
> >>
> >> This by passes Fuseki-provided security and puts the control function
> >> outside the Fuseki server in a separate server that is not part of Jena.
> >> It will only be secure if deployed in a constrained network environment.
> >>
> >> This is not secure except when run in a certain way and, personally, I
> >> don't want to have to deal with a CVE because of that. CVE handling is
> >> time consuming.
> >>
> >> I don't see why it is using jena-access (the named graph security
> >> feature) except for the filtering on TDB. It is creating a dynamic
> >> dataset for the query.
> >
> > You're right - it's only as secure as the middleware/proxy/whatever in
> > front of it which supplies the ACL. (This was never intended to be
> > used/exposed to end-users directly.)
>
> >> Concern 2: How does update fit into the picture? (GSP is not supported).
> >
> > I thought that, since GSP operations target a single graph, there is
> > no need to extend support to it since it's already possible to
> > restrict visibility (with the graph query parameter). Am I missing
> > something?
>
> Having different ways to protect data across different operations is
> confusing.  And quite easy to have unexpected problems which for
> security is bad.
>
> Accessing the default graph when it is the union of the named graphs.

Good point - I'd forgotten about the union. In that case I suppose
that completely invalidates the proposal, since with GSP GET/HEAD of
course don't have a body. (As explained in the PR-added readme,
putting the allowed graphs in a header only works with a relatively
small number of graphs, or if their IRIs are short.)
.. unless GSP GET in union-mode was disallowed, when this feature is enabled.

>
> >>
> >> Concern 3: It looks like a specific solution for a specific scenario.
> >> Will it get uptake by the wide Jena user community?
> >
> > It's definitely specific. My thinking was that, if a subset of this
> > were deemed useful, then it'd be better to exist as part of the core
> > offering as opposed to us just bolting it on ourselves (at my job).
> > But, if that's not the case - fair enough.
>
> What subsets do you have in mind?

(In isolation of Fuseki Graph ACL) Allow Jena Users to supply (from an
external-to-Fuseki/Jena system) a set of graphs to restrict SPARQL
queries to (without having to rewrite the query) with similar
performance to Fuseki Graph ACL (i.e. faster than the alternatives
listed in the PR-attcached readme).
Hmm, having just written that, I suppose that's not really a smaller subset.

>
>      Andy

-- 
Vilnis Termanis
Technical Specialist

e | vilnis.terma...@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Re: About JENA-2339 - security related

Reply via email to