You'd want to create an API for accessing it that doesn't give access to
the gremlin traversal object so that users couldn't mutate the graph. If
you give them the traversal, there are no ACLs to prevent them from doing
things like subtly dropping references to data if they have malicious
intent.

On Tue, Feb 25, 2020 at 2:36 PM Matt Burgess <[email protected]> wrote:

> What about an in-memory (perhaps with disk persistence) representation
> of the flow as a graph using Apache Tinkerpop? That would alleviate
> the need for a separate graph DB while still allowing you to do graph
> searches, at the cost of more memory usage. Not sure what the
> footprint would look like for large in-mem flow graphs but I could do
> some investigating, the AtlasReportingTask already does something
> similar. For this case I wouldn't think a "normal" search would
> necessarily use the graph or replace Lucene, but perhaps later we
> could add a power-user graph search capability using Gremlin. I've
> been looking at making the flow graph available externally (likely a
> REST call), but that could be leveraged internally as well.
>
> To Simon's Lucene filters:
>
> 1) Could we use "scope:here" and "group:myGroup" together to remove
> the recursive nature of the search? Basically saying if "scope" is not
> provided but "group" is, the scope defaults to "all".
> 2) If the term used in "group" filters is a process group name (rather
> than ID), do you think there's a use case for a "parent" filter in the
> case where names are provided? This would allow you, in the case of
> two identically-named child PGs with different parents, to narrow the
> search to the PG with the given parent.
>
> Regards,
> Matt
>
>
> On Mon, Feb 24, 2020 at 12:28 PM Mike Thomsen <[email protected]>
> wrote:
> >
> > FWIW, having spent a lot of time in the last year working with graph
> > database ingestion, I really don't see this story ending well for
> replacing
> > Lucene.
> >
> > On Mon, Feb 24, 2020 at 5:01 PM Otto Fowler <[email protected]>
> wrote:
> >
> > > +1 for the “bring your own golden hammer” approach
> > >
> > >
> > >
> > >
> > > On February 24, 2020 at 11:46:14, Mike Thomsen ([email protected]
> )
> > > wrote:
> > >
> > > Another thing I forgot to throw out there was that you have an issue of
> > > latency if you use Janus or Neo4j. Lucene will almost certainly have
> > > substantially lower latency for updating and querying the provenance
> data
> > > if you were to do a bake off between the two to power a provenance
> > > repository.
> > >
> > > That said, if you care more about being able to query with Cypher or
> > > Gremlin than having raw performance, you could write a custom
> provenance
> > > repository. They are pluggable.
> > >
> > > On Sat, Feb 22, 2020 at 7:00 AM Martin Ebert <[email protected]>
> wrote:
> > >
> > > > Hi Mike,
> > > > that is a fair point. You would actually raise the minimum
> requirements
> > > of
> > > > Nifi accordingly if you wanted to use a graph. As an additional
> > > > application, as we are currently planning, Neo4j is nevertheless a
> good
> > > > choice and there is nothing to be said against making it open
> source. The
> > > > open source version of Neo4j should be sufficient for this.
> > > >
> > > >
> > > > Mike Thomsen <[email protected]> schrieb am Sa., 22. Feb. 2020,
> > > > 02:36:
> > > >
> > > > > Martin,
> > > > >
> > > > > In theory, a graph database would be superior here. Absolutely. In
> > > > > practice, none of the tech out there is better than the current
> > > > > Lucene-based approach in terms of ease of development and
> integration
> > > and
> > > > > low memory footprint. Adding Neo4J or JanusGraph would cause a huge
> > > jump
> > > > in
> > > > > the minimum requirements to run NiFi. Possibly to the point where
> Xms
> > > and
> > > > > Xmx would have to start at 2GB for people getting started.
> > > > >
> > > > > It's been a long time since I've played with Atlas and the Atlas
> > > > > integration, but if that doesn't work you can build in support for
> > > Cypher
> > > > > and Gremlin by adding -Pinclude-graph to a 1.10 or 1.11 build. In
> 1.10,
> > > > one
> > > > > of the NARs was overlooked in that profile, so you'd need to add it
> > > back
> > > > to
> > > > > the profile. That was fixed in 1.11. The ExecuteGraphQuery
> processor
> > > will
> > > > > allow you to execute Cypher or Gremlin commands/scripts depending
> on
> > > > which
> > > > > controller service/driver you configure.
> > > > >
> > > > > On Fri, Feb 21, 2020 at 6:42 PM Martin Ebert <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > We still think about building a graph based search (Neo4j) in
> top of
> > > > > NiFi.
> > > > > > Would be also fantastic to have it within NiFi.
> > > > > >
> > > > > > There are plenty of examples
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> https://blog.grandstack.io/using-neo4js-full-text-search-with-graphql-e3fa484de2ea
> > > > > > From the idea it could go in this direction - of course much more
> > > > > > rudimentary. Then one would have the possibility to have only the
> > > > results
> > > > > > displayed as text or to find out exploratory connections (graph
> > > > layout).
> > > > > > The built-in data lineage function of NiFi would also benefit
> from
> > > the
> > > > > > power of Neo4j.
> > > > > >
> > > > > > Simon Bence <[email protected]> schrieb am Fr., 21. Feb.
> > > 2020,
> > > > > > 19:00:
> > > > > >
> > > > > > > Dear Community,
> > > > > > >
> > > > > > > In my project, I do use relatively high number of processors
> and
> > > > > process
> > > > > > > groups. The current search function on the NiFi UI has no
> > > > > capabilitites
> > > > > > to
> > > > > > > narrow the results based on the group, which would make the
> results
> > > > > more
> > > > > > > relevant, so I would like to propose a possible solution.
> Please if
> > > > you
> > > > > > > have any comment on this, do not hesitate to share it.
> > > > > > >
> > > > > > > The general approach would be to keep the current text box and
> > > extend
> > > > > the
> > > > > > > server side capabilities to process search query in the similar
> > > > manner
> > > > > > for
> > > > > > > example the Google search behaves.This extensions I would call
> > > > > "filters".
> > > > > > > For now I am interested in the ones I will mention below, but I
> > > > think,
> > > > > it
> > > > > > > is only a matter of small work for further extend the solution
> with
> > > > > > further
> > > > > > > ones.
> > > > > > >
> > > > > > > In order to distinguish the filters from the rest of the search
> > > > query,
> > > > > I
> > > > > > > propose to put them at the beginning of the query and use the
> > > > > > > [a-zA-Z0-9\.]{1..n}\:[a-zA-Z0-9\.]{1..n} format. For example a
> > > filter
> > > > > > might
> > > > > > > look the following: lorem:ipsum
> > > > > > >
> > > > > > > Adding this, the search query should look like the following:
> > > > > > >
> > > > > > > filter1:value filter2:value rest of the query
> > > > > > >
> > > > > > > As for processing the filters, I suggest the following
> behaviour:
> > > > > > >
> > > > > > > - Without filters the current behaviour should be kept
> > > > > > > - Everything after the filters should be handled as the search
> term
> > > > > > > - After the first "non filter word", anything should be
> considered
> > > as
> > > > > > part
> > > > > > > of the search term (meaning: to keep the text parsing simple, I
> > > would
> > > > > not
> > > > > > > go in the direction to support filters at the end of the query,
> > > etc.)
> > > > > > > - The ordering of the filters should have no effect on the
> result
> > > > > > > - Filter duplications should be eliminated
> > > > > > > - In case a filter appears multiple times in the query, the
> first
> > > > > > occasion
> > > > > > > will be used
> > > > > > > - Unknown filters should be ignored
> > > > > > > - Only adding filters will not end up with result, at least one
> > > > > character
> > > > > > > must appear as search term
> > > > > > >
> > > > > > > Suggested filters:
> > > > > > >
> > > > > > > scope
> > > > > > > Narrows the search based on the user's currently active process
> > > > group.
> > > > > > The
> > > > > > > allowed values are: "all" and "here". All produces the current
> > > > > behaviour,
> > > > > > > thus no filtering happens, but "here" should use the current
> > > process
> > > > > > group
> > > > > > > as "root" of the search, ignoring everything else (including
> parent
> > > > > > group).
> > > > > > > Note: This needs a minimal frontend change, because as I did
> see,
> > > > > > currently
> > > > > > > the current group is not sent with the search query.
> > > > > > >
> > > > > > > group
> > > > > > > Narrows the search for a given processing group, if it exists.
> The
> > > > > > > behaviour is recursive, thus the result will include the
> contained
> > > > > groups
> > > > > > > as well. If it is a non-existing group, the result list should
> be
> > > > > empty.
> > > > > > >
> > > > > > > properties
> > > > > > > Controls if properties values are included or not. If not
> provided,
> > > > the
> > > > > > > property values will be included. This is because in a lot of
> cases
> > > > > there
> > > > > > > is a huge number of results come from property names.
> > > > > > >
> > > > > > > - Valid values for inclusion: yes, true, include, 1
> > > > > > > - Valid values for exclusion: no, none, false, exclude, 0
> > > > > > >
> > > > > > > It is possible that the range of possible values should be
> limited
> > > > (and
> > > > > > not
> > > > > > > being ambiguous), but I see a merit of "permissiveness" here
> as it
> > > is
> > > > > > > simpler to remember.
> > > > > > >
> > > > > > > Also some example:
> > > > > > >
> > > > > > > 1.
> > > > > > > scope:here properties:exclude lorem ipsum
> > > > > > > This should search only in the current group (and it's
> children),
> > > > > > excluding
> > > > > > > properties and return with components containing the "lorem
> ipsum"
> > > > > > > expression.
> > > > > > >
> > > > > > > 2.
> > > > > > > group:myGroup someQuery
> > > > > > > This should result the finding of components with someQuery
> > > > expression,
> > > > > > but
> > > > > > > only within the myGroup group, even if it is not the active
> one.
> > > > > > >
> > > > > > > 3.
> > > > > > > scope:all properties:include lorem
> > > > > > > This should behave the same as "lorem" without filters.
> > > > > > >
> > > > > > > Thanks for reading, I am interested to hear your opinion!
> > > > > > >
> > > > > > > Kind regards,
> > > > > > > Bence
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Reply via email to