Re: [DISCUSS] Advanced search capabilities

Mike Thomsen Mon, 24 Feb 2020 09:29:02 -0800

FWIW, having spent a lot of time in the last year working with graph
database ingestion, I really don't see this story ending well for replacing
Lucene.


On Mon, Feb 24, 2020 at 5:01 PM Otto Fowler <[email protected]> wrote:

> +1 for the “bring your own golden hammer” approach
>
>
>
>
> On February 24, 2020 at 11:46:14, Mike Thomsen ([email protected])
> wrote:
>
> Another thing I forgot to throw out there was that you have an issue of
> latency if you use Janus or Neo4j. Lucene will almost certainly have
> substantially lower latency for updating and querying the provenance data
> if you were to do a bake off between the two to power a provenance
> repository.
>
> That said, if you care more about being able to query with Cypher or
> Gremlin than having raw performance, you could write a custom provenance
> repository. They are pluggable.
>
> On Sat, Feb 22, 2020 at 7:00 AM Martin Ebert <[email protected]> wrote:
>
> > Hi Mike,
> > that is a fair point. You would actually raise the minimum requirements
> of
> > Nifi accordingly if you wanted to use a graph. As an additional
> > application, as we are currently planning, Neo4j is nevertheless a good
> > choice and there is nothing to be said against making it open source. The
> > open source version of Neo4j should be sufficient for this.
> >
> >
> > Mike Thomsen <[email protected]> schrieb am Sa., 22. Feb. 2020,
> > 02:36:
> >
> > > Martin,
> > >
> > > In theory, a graph database would be superior here. Absolutely. In
> > > practice, none of the tech out there is better than the current
> > > Lucene-based approach in terms of ease of development and integration
> and
> > > low memory footprint. Adding Neo4J or JanusGraph would cause a huge
> jump
> > in
> > > the minimum requirements to run NiFi. Possibly to the point where Xms
> and
> > > Xmx would have to start at 2GB for people getting started.
> > >
> > > It's been a long time since I've played with Atlas and the Atlas
> > > integration, but if that doesn't work you can build in support for
> Cypher
> > > and Gremlin by adding -Pinclude-graph to a 1.10 or 1.11 build. In 1.10,
> > one
> > > of the NARs was overlooked in that profile, so you'd need to add it
> back
> > to
> > > the profile. That was fixed in 1.11. The ExecuteGraphQuery processor
> will
> > > allow you to execute Cypher or Gremlin commands/scripts depending on
> > which
> > > controller service/driver you configure.
> > >
> > > On Fri, Feb 21, 2020 at 6:42 PM Martin Ebert <[email protected]>
> > wrote:
> > >
> > > > We still think about building a graph based search (Neo4j) in top of
> > > NiFi.
> > > > Would be also fantastic to have it within NiFi.
> > > >
> > > > There are plenty of examples
> > > >
> > > >
> > >
> >
>
> https://blog.grandstack.io/using-neo4js-full-text-search-with-graphql-e3fa484de2ea
> > > > From the idea it could go in this direction - of course much more
> > > > rudimentary. Then one would have the possibility to have only the
> > results
> > > > displayed as text or to find out exploratory connections (graph
> > layout).
> > > > The built-in data lineage function of NiFi would also benefit from
> the
> > > > power of Neo4j.
> > > >
> > > > Simon Bence <[email protected]> schrieb am Fr., 21. Feb.
> 2020,
> > > > 19:00:
> > > >
> > > > > Dear Community,
> > > > >
> > > > > In my project, I do use relatively high number of processors and
> > > process
> > > > > groups. The current search function on the NiFi UI has no
> > > capabilitites
> > > > to
> > > > > narrow the results based on the group, which would make the results
> > > more
> > > > > relevant, so I would like to propose a possible solution. Please if
> > you
> > > > > have any comment on this, do not hesitate to share it.
> > > > >
> > > > > The general approach would be to keep the current text box and
> extend
> > > the
> > > > > server side capabilities to process search query in the similar
> > manner
> > > > for
> > > > > example the Google search behaves.This extensions I would call
> > > "filters".
> > > > > For now I am interested in the ones I will mention below, but I
> > think,
> > > it
> > > > > is only a matter of small work for further extend the solution with
> > > > further
> > > > > ones.
> > > > >
> > > > > In order to distinguish the filters from the rest of the search
> > query,
> > > I
> > > > > propose to put them at the beginning of the query and use the
> > > > > [a-zA-Z0-9\.]{1..n}\:[a-zA-Z0-9\.]{1..n} format. For example a
> filter
> > > > might
> > > > > look the following: lorem:ipsum
> > > > >
> > > > > Adding this, the search query should look like the following:
> > > > >
> > > > > filter1:value filter2:value rest of the query
> > > > >
> > > > > As for processing the filters, I suggest the following behaviour:
> > > > >
> > > > > - Without filters the current behaviour should be kept
> > > > > - Everything after the filters should be handled as the search term
> > > > > - After the first "non filter word", anything should be considered
> as
> > > > part
> > > > > of the search term (meaning: to keep the text parsing simple, I
> would
> > > not
> > > > > go in the direction to support filters at the end of the query,
> etc.)
> > > > > - The ordering of the filters should have no effect on the result
> > > > > - Filter duplications should be eliminated
> > > > > - In case a filter appears multiple times in the query, the first
> > > > occasion
> > > > > will be used
> > > > > - Unknown filters should be ignored
> > > > > - Only adding filters will not end up with result, at least one
> > > character
> > > > > must appear as search term
> > > > >
> > > > > Suggested filters:
> > > > >
> > > > > scope
> > > > > Narrows the search based on the user's currently active process
> > group.
> > > > The
> > > > > allowed values are: "all" and "here". All produces the current
> > > behaviour,
> > > > > thus no filtering happens, but "here" should use the current
> process
> > > > group
> > > > > as "root" of the search, ignoring everything else (including parent
> > > > group).
> > > > > Note: This needs a minimal frontend change, because as I did see,
> > > > currently
> > > > > the current group is not sent with the search query.
> > > > >
> > > > > group
> > > > > Narrows the search for a given processing group, if it exists. The
> > > > > behaviour is recursive, thus the result will include the contained
> > > groups
> > > > > as well. If it is a non-existing group, the result list should be
> > > empty.
> > > > >
> > > > > properties
> > > > > Controls if properties values are included or not. If not provided,
> > the
> > > > > property values will be included. This is because in a lot of cases
> > > there
> > > > > is a huge number of results come from property names.
> > > > >
> > > > > - Valid values for inclusion: yes, true, include, 1
> > > > > - Valid values for exclusion: no, none, false, exclude, 0
> > > > >
> > > > > It is possible that the range of possible values should be limited
> > (and
> > > > not
> > > > > being ambiguous), but I see a merit of "permissiveness" here as it
> is
> > > > > simpler to remember.
> > > > >
> > > > > Also some example:
> > > > >
> > > > > 1.
> > > > > scope:here properties:exclude lorem ipsum
> > > > > This should search only in the current group (and it's children),
> > > > excluding
> > > > > properties and return with components containing the "lorem ipsum"
> > > > > expression.
> > > > >
> > > > > 2.
> > > > > group:myGroup someQuery
> > > > > This should result the finding of components with someQuery
> > expression,
> > > > but
> > > > > only within the myGroup group, even if it is not the active one.
> > > > >
> > > > > 3.
> > > > > scope:all properties:include lorem
> > > > > This should behave the same as "lorem" without filters.
> > > > >
> > > > > Thanks for reading, I am interested to hear your opinion!
> > > > >
> > > > > Kind regards,
> > > > > Bence
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Advanced search capabilities

Reply via email to