(inline)

On Fri, 29 Jul 2022 at 07:56, Martynas Jusevičius
<marty...@atomgraph.com> wrote:
>
> “Sets of triples” — aren’t these datasets?
>
> Couldn’t this use case be addressed by maintaining per-user datasets? Not
> sure if Fuseki can create datasets on the fly, but this seems like a much
> simpler feature to implement compared to a whole new ACL mechanism.

The idea is, that if you had these "sets of triples" A-Z, one user
might be allowed to see A-M and another C-Q. With per-user datasets
you'd have to duplicate data to achieve that. And, when the ACL
changes, you'd have to copy/move triples from one dataset to another.
(Or am I missing a nuance to your proposal? Do you mean dynamically
creating a new dataset which references graphs from another dataset?)

>
> On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis
> <vilnis.terma...@iotics.com.invalid> wrote:
>
> > Hi Andy & Jena development community,
> >
> > (Answers inline - apologies if I repeat myself)
> >
> > FYI - Our aim is to enable end-users to make SPARQL queries whilst
> > respecting visibility restrictions.
> > I.e. users (indirectly) add sets of related triples to a dataset and
> > they can choose who has visibility (beyond themselves) over these,
> > either: Nobody, Everyone or a chosen set (which can be updated). Note
> > that this restriction is not by a specific subject or predicate.
> > (Although the sets of triples do have relationships - not all of them
> > are known in advance.)
> >
> > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> wrote:
> > >
> > > JENA-2339
> > > PR#1441
> > >
> > https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md
> > >
> > > tl;dr:
> > >
> > > It is a different role for Fuseki.
> > >
> > > Fuseki execute the security but the setup and control is from a trusted
> > > external server on the request execution path.
> > >
> > > It assumes certain deployment environments to be safe.
> >
> > FYI - In our case this means that we have a "make SPARQL query" API
> > call. When received, the applicable user (our domain) is known and, in
> > the proposed PR, we can prepend the set of allowed graphs to the query
> > (which have been looked up prior to query execution, externally). The
> > end user has NO direct access to Fuseki itself.
> >
> > >
> > > My feeling is that we should make Fuseki configurable enough so that a
> > > downstream 3rd party can add their security solution that is suitable
> > > for their environment. But we should not incorporate a particular
> > > security solution that relies on the deployment environment.
> > >
> > > ----
> > >
> > > I've asked for more information about the claim on a performance
> > > motivator and some other background information.
> > >
> > > The usage patterns are not yet clear. The data is described as "a one
> > > graph per handful of subjects and their properties" and "100s of
> > > graphs". What the queries are is unstated.
> >
> > Right now, each graph has in the range of 300-500 triples (though the
> > amount depends on how much additional/domain-specific metadata
> > end-users choose to add) and the scale of deployed Fuseki datasets
> > range from having a few to ~6k graphs.
> > Since we'd like to allow end-users to run **any** queries they wish
> > (we enforce query timeouts), it's difficult to give concrete examples.
> > I can however say that TDB unionDefaultGraph mode is enabled (i.e.
> > most end-users won't choose to explicitly target a specific graph) and
> > that one of our representative "search" queries (which combines
> > GeoSPARQL + multiple explicit property matching across multiple
> > different subjects in a UNION + subsequent collection of mandatory &
> > optional fields) is between 20-40% faster than the current custom
> > solution.
> > (Note that we have also tried query re-writing to insert FROM/FROM
> > NAMED clauses - and that is very slow in comparison, presumably to the
> > higher level filtering involved, unlike the quad filter herein.)
> >
> > >
> > > There is no characterisation of the queries being made. If we are
> > > talking about overheads, the cases of a few big queries and many small
> > > queries are different.
> >
> > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and
> > adding a certain set of graphs makes the queries on my laptop take:
> > ~600 graphs ~115ms
> > ~1500 graphs ~162ms
> > ~3k graphs ~240ms
> > ~6k graphs ~400ms
> >
> > >
> > > The scale looks small (less than a million triples of triples -
> > > approximating as 100 graphs * 1000 triples). That makes the point about
> > > access to TDB hooks a bit redundant.
> >
> > The dataset I've tested this with has ~1.8M triples. That's not to say
> > this is the scale we're hoping to satisfy - that's the just what I
> > tested with first. By redundant, do you mean an alternative approach
> > should be used for this scale?
> >
> > >
> > >
> > > There is are distinguished users. A request from one of these users
> > > causes the set of visible graphs to be read from a comment at the start
> > > of the query text in the request.
> > >
> > > The use of large numbers of small named graphs to manage security
> > > settings looks to me like triple-level security.  I have already
> > > mentioned work "FMod_ABAC": (£job related) awhile back (2/Jan/2022). It
> > > is triple level attribute-based security.
> >
> > It could well be that I'm seeing the wrong solution for the feature
> > we're trying to support (that's the other reason for reaching out to
> > the community. The reason (rightly or wrongly) to model this as a set
> > of graphs is: Each set of triples to be restricted are related, but
> > span multiple subjects and could also relate to other subjects in
> > other sets (as well as externally).
> > Hence I couldn't see how e.g. Jena Permissions could be applied here:
> > When you're provided with a single triple to check - you would have to
> > understand what type subject it is and how it relates to the "top
> > level" subject to which the ACL applies. Bundling everything into a
> > graph seemed like viable option.
> >
> > >
> > > Concern 1:
> > >
> > > This by passes Fuseki-provided security and puts the control function
> > > outside the Fuseki server in a separate server that is not part of Jena.
> > > It will only be secure if deployed in a constrained network environment.
> > >
> > > This is not secure except when run in a certain way and, personally, I
> > > don't want to have to deal with a CVE because of that. CVE handling is
> > > time consuming.
> > >
> > > I don't see why it is using jena-access (the named graph security
> > > feature) except for the filtering on TDB. It is creating a dynamic
> > > dataset for the query.
> >
> > You're right - it's only as secure as the middleware/proxy/whatever in
> > front of it which supplies the ACL. (This was never intended to be
> > used/exposed to end-users directly.)
> > The purpose of extending jena-access (instead of immediately writing
> > it as a separate module) was to illustrate with minimal code changes
> > (+ extension of existing tests) what it could look like, for
> > discussion. (The quad filtering / performance aspect would be the
> > same, regardless of location, I presume.)
> >
> > >
> > > Concern 2: How does update fit into the picture? (GSP is not supported).
> >
> > I thought that, since GSP operations target a single graph, there is
> > no need to extend support to it since it's already possible to
> > restrict visibility (with the graph query parameter). Am I missing
> > something?
> >
> > >
> > > Concern 3: It looks like a specific solution for a specific scenario.
> > > Will it get uptake by the wide Jena user community?
> >
> > It's definitely specific. My thinking was that, if a subset of this
> > were deemed useful, then it'd be better to exist as part of the core
> > offering as opposed to us just bolting it on ourselves (at my job).
> > But, if that's not the case - fair enough.
> >
> > >
> > > Concern 4: Is there long-term support and maintenance for the feature?
> > > (e.g. 5y+)
> > > How do we respond to users@ message about it? Is it experimental code or
> > > has it been used for real? Is the feature set stable?
> >
> > My understanding is that jena-access is classed as stable (we're using
> > it for something else already in production) and thus, since this
> > merely produces a SecurityContext with a larger set of graphs, would
> > theoretically be no less stable.
> >
> > >
> > >
> > > Opinion: it is not unreasonable to provide support for this kind of
> > > customization of Fuseki.
> > >
> > > An extension can then provide whatever security is needed for the
> > > situation and it is the Fuseki user/operator making the decisions about
> > > what is acceptable security and what isn't.
> > >
> > > Fuseki has ways to add custom processors and this seems the way to
> > > provide an alternative way to make queries.
> > >
> > > Putting it in the distribution codebase is a big step for the project.
> > > At the very least, it needs to be mature and likely to be used.
> >
> > We wouldn't be reaching out if we weren't likely to want to use such a
> > feature. All these concerns/questions/suggestions are exactly what we
> > were hoping for. If I can provide any more context/tests/samples, let
> > me know.
> > (I completely get the concerns about diluting a known security feature
> > and have no issue with something like this being a separate
> > component.)
> >
> > >
> > > Background: Currently jena-access is in Fuseki main. It is not optional
> > > because it predates Fuseki modules.
> > >
> > >      Andy
> >
> >
> >
> > --
> > Vilnis Termanis
> > Technical Specialist
> >
> > e | vilnis.terma...@iotics.com
> > www.iotics.com
> >



-- 
Vilnis Termanis
Technical Specialist

e | vilnis.terma...@iotics.com
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended
for your use, please contact Iotics. For more on our Privacy Policy
please visit https://www.iotics.com/legal/

Reply via email to