On Mon, 8 Aug 2022 at 18.06, Vilnis Termanis <vilnis.terma...@iotics.com.invalid> wrote:
> On Mon, 1 Aug 2022 at 12:29, Andy Seaborne <a...@apache.org> wrote: > > > > > > > > On 28/07/2022 20:50, Vilnis Termanis wrote: > > > Hi Andy & Jena development community, > > > > > > (Answers inline - apologies if I repeat myself) > > > > > > FYI - Our aim is to enable end-users to make SPARQL queries whilst > > > respecting visibility restrictions. > > > I.e. users (indirectly) add sets of related triples to a dataset and > > > they can choose who has visibility (beyond themselves) over these, > > > either: Nobody, Everyone or a chosen set (which can be updated). Note > > > that this restriction is not by a specific subject or predicate. > > > (Although the sets of triples do have relationships - not all of them > > > are known in advance.) > > > > Let's clarify terminology here. > > > > A "Jena user" is a person or organisation that is downloading Jena, > > either as the formal release (source code) or convenience binaries (e.g. > > jars from Maven Central). The "convenience binaries" is the more usual > case. > > > > Not Iotics users. Systems built with Jena have their own users. > > (The Apache License applies - including clause 7.) > > > > The responsibility is between the downstream system builder and their > > users of product or service being "fit for purpose". > > Sorry about that - I should have been clearer with the terms. > > In the submission - there is only one entity - the "Fuseki user" (e.g. > via BasicAuth) to which the dynamic mode applies. However, since this > is intended to be used a part of an integration (by Jena users - to > gate access to their own domain-specific end-users), the > authentication bit I think is irrelevant. (E.g. a separate service > endpoint could have the proposed functionality enabled and this is > what the integration calls.) > > > > > > using a "SELECT {} 1" query, and > > > adding a certain set of graphs makes the queries on my laptop take: > > > ~600 graphs ~115ms > > > ~1500 graphs ~162ms > > > ~3k graphs ~240ms > > > ~6k graphs ~400ms > > > > That's an illustration of the current system but we don't know what is > > the cause of the cost. > > > > What piece of the code is taking the time? > > Maybe the right thing to do is make it faster. > > I haven't looked into this in great detail, but from my understanding > the time taken is a combination of a) parsing the input of allowed > graphs and b) generating a new SecurityContext (holding a hashmap of > said graphs). If providing a set of allowed graphs in the proposed way > is not a no-go, I'm happy to dig into where the cost is exactly. > > > > > And in the general area - what are you using for authentication? > > > > For us right now, we're only using fuseki:auth "basic" for the > purposes of differentiating different access levels against Fuseki > Data Access Control configuration (by mapping those to Fuseki users), > e.g.: > Fuseki user1 => allowed to see graphs A & B > Fuseki user2 => allowed to see graphs B & C > Fuseki user3 => has the proposed feature dynamic-access feature > enabled (i.e. no access unless the pragma preamble exists in query > with 1+ graphs defined) > > Said Fuseki users (=roles) are then chosen based on what the system > needs to do (domain-specific). > > > There is some bearer auth support in the next release ... it does not > > provide complete bearer auth because it can't cover all cases (e.g. JWT > > validation). It is more of a framework template with which to build a > > local solution. > > I'm showing my lack of JWT/Bearer auth knowledge - but is this > building block for what Martynas suggested, namely the token implies > the user to which dynamic ACL applies and then access can be > restricted e.g. via WACL/Solid? (Correct me if I'm wrong but is this > still not a solution that involves ACL rules being stored in Jena or > at least be accessible via SPARQL for a SERVICE call?) > LinkedDataHub identifies agents with URIs, which can be called WebIDs [1]. Currently it supports WebID-TLS and OIDC with JWT tokens as authentication protocols. Authorization is checked using WAC as mentioned earlier. We use 2 Fuseki endpoints for each webapp instance: “end-user” and “admin”. The auth queries federate between them using SERVICE. Sandboxing them might be a little tricky, but in general it has worked well and did not require any new security features in Fuseki. [1] https://www.w3.org/2005/Incubator/webid/spec/ [2] https://github.com/AtomGraph/LinkedDataHub/issues/107 > > > > ---- > > > > "FMod_ABAC" is not related to jena-permissions. > > > > "FMod_" means Fuseki Module. > > https://jena.apache.org/documentation/fuseki2/fuseki-modules > > No forks. > > ABAC = Attribute Based Access Control. > > > > Using attributes separates ACLs from direct naming users for access to > > things. FMod_ABAC things are triples. Triples have "labels". Labels are > > attribute expressions, including AND and OR operators. > > > > "employee | contractor" -- must have the "employee" attribute > > or the "contractor" attribute. > > > > "employee & dept=engineering" -- must have both "employee" and > > "dept=engineering" attributes. > > > > There is a division of responsibilities. The data is labelled - so the > > data owner is responsible for the data attribute requirements. The > > assignment of attributes to users is separate. > > > > > FYI - In our case this means that we have a "make SPARQL query" API > > > call. When received, the applicable user (our domain) is known and, in > > > the proposed PR, we can prepend the set of allowed graphs to the query > > > (which have been looked up prior to query execution, externally). The > > > end user has NO direct access to Fuseki itself. > > > > You have a solution presuming a protected network, or possibly a > > container with in-container networking. > > > > That's my Concern 1. Security conditions outside Jena must be met. > > Having that, even if not in use, is an issue. > > > > Maybe I misunderstand, but is this not in the same boat as: > a) Configuring a service which allows write access (but not gating who > can reach said service) > b) Configuring Fuseki access control in config and allowing 1+ graphs > (which shouldn't be included) > c) Configuring a service which allows read access to all graphs (i.e. > without Fuseki Graph ACL - again unintended) > > .. in that it's up to the Jena User to set up their deployment in a > way that matches any security requirements. > (The proposed feature, as a separate extension or part of Fuseki Graph > ACL would have to be explicitly configured/enabled.) > > > >> Concern 1: > > >> > > >> This by passes Fuseki-provided security and puts the control function > > >> outside the Fuseki server in a separate server that is not part of > Jena. > > >> It will only be secure if deployed in a constrained network > environment. > > >> > > >> This is not secure except when run in a certain way and, personally, I > > >> don't want to have to deal with a CVE because of that. CVE handling is > > >> time consuming. > > >> > > >> I don't see why it is using jena-access (the named graph security > > >> feature) except for the filtering on TDB. It is creating a dynamic > > >> dataset for the query. > > > > > > You're right - it's only as secure as the middleware/proxy/whatever in > > > front of it which supplies the ACL. (This was never intended to be > > > used/exposed to end-users directly.) > > > > >> Concern 2: How does update fit into the picture? (GSP is not > supported). > > > > > > I thought that, since GSP operations target a single graph, there is > > > no need to extend support to it since it's already possible to > > > restrict visibility (with the graph query parameter). Am I missing > > > something? > > > > Having different ways to protect data across different operations is > > confusing. And quite easy to have unexpected problems which for > > security is bad. > > > > Accessing the default graph when it is the union of the named graphs. > > Good point - I'd forgotten about the union. In that case I suppose > that completely invalidates the proposal, since with GSP GET/HEAD of > course don't have a body. (As explained in the PR-added readme, > putting the allowed graphs in a header only works with a relatively > small number of graphs, or if their IRIs are short.) > .. unless GSP GET in union-mode was disallowed, when this feature is > enabled. > > > > > >> > > >> Concern 3: It looks like a specific solution for a specific scenario. > > >> Will it get uptake by the wide Jena user community? > > > > > > It's definitely specific. My thinking was that, if a subset of this > > > were deemed useful, then it'd be better to exist as part of the core > > > offering as opposed to us just bolting it on ourselves (at my job). > > > But, if that's not the case - fair enough. > > > > What subsets do you have in mind? > > (In isolation of Fuseki Graph ACL) Allow Jena Users to supply (from an > external-to-Fuseki/Jena system) a set of graphs to restrict SPARQL > queries to (without having to rewrite the query) with similar > performance to Fuseki Graph ACL (i.e. faster than the alternatives > listed in the PR-attcached readme). > Hmm, having just written that, I suppose that's not really a smaller > subset. > > > > > Andy > > -- > Vilnis Termanis > Technical Specialist > > e | vilnis.terma...@iotics.com > www.iotics.com > > The information contained in this email is strictly confidential and > intended only for the parties noted. If this email was not intended > for your use, please contact Iotics. For more on our Privacy Policy > please visit https://www.iotics.com/legal/ >