Re: About JENA-2339 - security related

Martynas Jusevičius Mon, 08 Aug 2022 10:22:45 -0700

On Mon, 8 Aug 2022 at 18.06, Vilnis Termanis
<vilnis.terma...@iotics.com.invalid> wrote:


> On Mon, 1 Aug 2022 at 12:29, Andy Seaborne <a...@apache.org> wrote:
> >
> >
> >
> > On 28/07/2022 20:50, Vilnis Termanis wrote:
> > > Hi Andy & Jena development community,
> > >
> > > (Answers inline - apologies if I repeat myself)
> > >
> > > FYI - Our aim is to enable end-users to make SPARQL queries whilst
> > > respecting visibility restrictions.
> > > I.e. users (indirectly) add sets of related triples to a dataset and
> > > they can choose who has visibility (beyond themselves) over these,
> > > either: Nobody, Everyone or a chosen set (which can be updated). Note
> > > that this restriction is not by a specific subject or predicate.
> > > (Although the sets of triples do have relationships - not all of them
> > > are known in advance.)
> >
> > Let's clarify terminology here.
> >
> > A "Jena user" is a person or organisation that is downloading Jena,
> > either as the formal release (source code) or convenience binaries (e.g.
> > jars from Maven Central). The "convenience binaries" is the more usual
> case.
> >
> > Not Iotics users. Systems built with Jena have their own users.
> > (The Apache License applies - including clause 7.)
> >
> > The responsibility is between the downstream system builder and their
> > users of product or service being "fit for purpose".
>
> Sorry about that - I should have been clearer with the terms.
>
> In the submission - there is only one entity - the "Fuseki user" (e.g.
> via BasicAuth) to which the dynamic mode applies. However, since this
> is intended to be used a part of an integration (by Jena users - to
> gate access to their own domain-specific end-users), the
> authentication bit I think is irrelevant. (E.g. a separate service
> endpoint could have the proposed functionality enabled and this is
> what the integration calls.)
>
> >
> > > using a "SELECT {} 1" query, and
> > > adding a certain set of graphs makes the queries on my laptop take:
> > > ~600 graphs ~115ms
> > > ~1500 graphs ~162ms
> > > ~3k graphs ~240ms
> > > ~6k graphs ~400ms
> >
> > That's an illustration of the current system but we don't know what is
> > the cause of the cost.
> >
> > What piece of the code is taking the time?
> > Maybe the right thing to do is make it faster.
>
> I haven't looked into this in great detail, but from my understanding
> the time taken is a combination of a) parsing the input of allowed
> graphs and b) generating a new SecurityContext (holding a hashmap of
> said graphs). If providing a set of allowed graphs in the proposed way
> is not a no-go, I'm happy to dig into where the cost is exactly.
>
> >
> > And in the general area - what are you using for authentication?
> >
>
> For us right now, we're only using fuseki:auth "basic" for the
> purposes of differentiating different access levels against Fuseki
> Data Access Control configuration (by mapping those to Fuseki users),
> e.g.:
> Fuseki user1 => allowed to see graphs A & B
> Fuseki user2 => allowed to see graphs B & C
> Fuseki user3 => has the proposed feature dynamic-access feature
> enabled (i.e. no access unless the pragma preamble exists in query
> with 1+ graphs defined)
>
> Said Fuseki users (=roles) are then chosen based on what the system
> needs to do (domain-specific).
>
> > There is some bearer auth support in the next release ... it does not
> > provide complete bearer auth because it can't cover all cases (e.g. JWT
> > validation). It is more of a framework template with which to build a
> > local solution.
>
> I'm showing my lack of JWT/Bearer auth knowledge - but is this
> building block for what Martynas suggested, namely the token implies
> the user to which dynamic ACL applies and then access can be
> restricted e.g. via WACL/Solid? (Correct me if I'm wrong but is  this
> still not a solution that involves ACL rules being stored in Jena or
> at least be accessible via SPARQL for a SERVICE call?)
>

LinkedDataHub identifies agents with URIs, which can be called WebIDs [1].
Currently it supports WebID-TLS and OIDC with JWT tokens as authentication
protocols. Authorization is checked using WAC as mentioned earlier.

We use 2 Fuseki endpoints for each webapp instance: “end-user” and “admin”.
The auth queries federate between them using SERVICE. Sandboxing them might
be a little tricky, but in general it has worked well and did not require
any new security features in Fuseki.

[1] https://www.w3.org/2005/Incubator/webid/spec/
[2] https://github.com/AtomGraph/LinkedDataHub/issues/107


> >
> > ----
> >
> > "FMod_ABAC" is not related to jena-permissions.
> >
> > "FMod_" means Fuseki Module.
> > https://jena.apache.org/documentation/fuseki2/fuseki-modules
> >     No forks.
> > ABAC = Attribute Based Access Control.
> >
> > Using attributes separates ACLs from direct naming users for access to
> > things. FMod_ABAC things are triples. Triples have "labels". Labels are
> > attribute expressions, including AND and OR operators.
> >
> >      "employee | contractor" -- must have the "employee" attribute
> >                                 or the "contractor" attribute.
> >
> >      "employee & dept=engineering" -- must have both "employee" and
> >                                      "dept=engineering" attributes.
> >
> > There is a division of responsibilities. The data is labelled - so the
> > data owner is responsible for the data attribute requirements. The
> > assignment of attributes to users is separate.
> >
> > > FYI - In our case this means that we have a "make SPARQL query" API
> > > call. When received, the applicable user (our domain) is known and, in
> > > the proposed PR, we can prepend the set of allowed graphs to the query
> > > (which have been looked up prior to query execution, externally). The
> > > end user has NO direct access to Fuseki itself.
> >
> > You have a solution presuming a protected network, or possibly a
> > container with in-container networking.
> >
> > That's my Concern 1. Security conditions outside Jena must be met.
> > Having that, even if not in use, is an issue.
> >
>
> Maybe I misunderstand, but is this not in the same boat as:
> a) Configuring a service which allows write access (but not gating who
> can reach said service)
> b) Configuring Fuseki access control in config and allowing 1+ graphs
> (which shouldn't be included)
> c) Configuring a service which allows read access to all graphs (i.e.
> without Fuseki Graph ACL - again unintended)
>
> .. in that it's up to the Jena User to set up their deployment in a
> way that matches any security requirements.
> (The proposed feature, as a separate extension or part of Fuseki Graph
> ACL would have to be explicitly configured/enabled.)
>
> > >> Concern 1:
> > >>
> > >> This by passes Fuseki-provided security and puts the control function
> > >> outside the Fuseki server in a separate server that is not part of
> Jena.
> > >> It will only be secure if deployed in a constrained network
> environment.
> > >>
> > >> This is not secure except when run in a certain way and, personally, I
> > >> don't want to have to deal with a CVE because of that. CVE handling is
> > >> time consuming.
> > >>
> > >> I don't see why it is using jena-access (the named graph security
> > >> feature) except for the filtering on TDB. It is creating a dynamic
> > >> dataset for the query.
> > >
> > > You're right - it's only as secure as the middleware/proxy/whatever in
> > > front of it which supplies the ACL. (This was never intended to be
> > > used/exposed to end-users directly.)
> >
> > >> Concern 2: How does update fit into the picture? (GSP is not
> supported).
> > >
> > > I thought that, since GSP operations target a single graph, there is
> > > no need to extend support to it since it's already possible to
> > > restrict visibility (with the graph query parameter). Am I missing
> > > something?
> >
> > Having different ways to protect data across different operations is
> > confusing.  And quite easy to have unexpected problems which for
> > security is bad.
> >
> > Accessing the default graph when it is the union of the named graphs.
>
> Good point - I'd forgotten about the union. In that case I suppose
> that completely invalidates the proposal, since with GSP GET/HEAD of
> course don't have a body. (As explained in the PR-added readme,
> putting the allowed graphs in a header only works with a relatively
> small number of graphs, or if their IRIs are short.)
> .. unless GSP GET in union-mode was disallowed, when this feature is
> enabled.
>
> >
> > >>
> > >> Concern 3: It looks like a specific solution for a specific scenario.
> > >> Will it get uptake by the wide Jena user community?
> > >
> > > It's definitely specific. My thinking was that, if a subset of this
> > > were deemed useful, then it'd be better to exist as part of the core
> > > offering as opposed to us just bolting it on ourselves (at my job).
> > > But, if that's not the case - fair enough.
> >
> > What subsets do you have in mind?
>
> (In isolation of Fuseki Graph ACL) Allow Jena Users to supply (from an
> external-to-Fuseki/Jena system) a set of graphs to restrict SPARQL
> queries to (without having to rewrite the query) with similar
> performance to Fuseki Graph ACL (i.e. faster than the alternatives
> listed in the PR-attcached readme).
> Hmm, having just written that, I suppose that's not really a smaller
> subset.
>
> >
> >      Andy
>
> --
> Vilnis Termanis
> Technical Specialist
>
> e | vilnis.terma...@iotics.com
> www.iotics.com
>
> The information contained in this email is strictly confidential and
> intended only for the parties noted. If this email was not intended
> for your use, please contact Iotics. For more on our Privacy Policy
> please visit https://www.iotics.com/legal/
>

Re: About JENA-2339 - security related

Reply via email to