FWIW, having spent a lot of time in the last year working with graph database ingestion, I really don't see this story ending well for replacing Lucene.
On Mon, Feb 24, 2020 at 5:01 PM Otto Fowler <[email protected]> wrote: > +1 for the “bring your own golden hammer” approach > > > > > On February 24, 2020 at 11:46:14, Mike Thomsen ([email protected]) > wrote: > > Another thing I forgot to throw out there was that you have an issue of > latency if you use Janus or Neo4j. Lucene will almost certainly have > substantially lower latency for updating and querying the provenance data > if you were to do a bake off between the two to power a provenance > repository. > > That said, if you care more about being able to query with Cypher or > Gremlin than having raw performance, you could write a custom provenance > repository. They are pluggable. > > On Sat, Feb 22, 2020 at 7:00 AM Martin Ebert <[email protected]> wrote: > > > Hi Mike, > > that is a fair point. You would actually raise the minimum requirements > of > > Nifi accordingly if you wanted to use a graph. As an additional > > application, as we are currently planning, Neo4j is nevertheless a good > > choice and there is nothing to be said against making it open source. The > > open source version of Neo4j should be sufficient for this. > > > > > > Mike Thomsen <[email protected]> schrieb am Sa., 22. Feb. 2020, > > 02:36: > > > > > Martin, > > > > > > In theory, a graph database would be superior here. Absolutely. In > > > practice, none of the tech out there is better than the current > > > Lucene-based approach in terms of ease of development and integration > and > > > low memory footprint. Adding Neo4J or JanusGraph would cause a huge > jump > > in > > > the minimum requirements to run NiFi. Possibly to the point where Xms > and > > > Xmx would have to start at 2GB for people getting started. > > > > > > It's been a long time since I've played with Atlas and the Atlas > > > integration, but if that doesn't work you can build in support for > Cypher > > > and Gremlin by adding -Pinclude-graph to a 1.10 or 1.11 build. In 1.10, > > one > > > of the NARs was overlooked in that profile, so you'd need to add it > back > > to > > > the profile. That was fixed in 1.11. The ExecuteGraphQuery processor > will > > > allow you to execute Cypher or Gremlin commands/scripts depending on > > which > > > controller service/driver you configure. > > > > > > On Fri, Feb 21, 2020 at 6:42 PM Martin Ebert <[email protected]> > > wrote: > > > > > > > We still think about building a graph based search (Neo4j) in top of > > > NiFi. > > > > Would be also fantastic to have it within NiFi. > > > > > > > > There are plenty of examples > > > > > > > > > > > > > > > https://blog.grandstack.io/using-neo4js-full-text-search-with-graphql-e3fa484de2ea > > > > From the idea it could go in this direction - of course much more > > > > rudimentary. Then one would have the possibility to have only the > > results > > > > displayed as text or to find out exploratory connections (graph > > layout). > > > > The built-in data lineage function of NiFi would also benefit from > the > > > > power of Neo4j. > > > > > > > > Simon Bence <[email protected]> schrieb am Fr., 21. Feb. > 2020, > > > > 19:00: > > > > > > > > > Dear Community, > > > > > > > > > > In my project, I do use relatively high number of processors and > > > process > > > > > groups. The current search function on the NiFi UI has no > > > capabilitites > > > > to > > > > > narrow the results based on the group, which would make the results > > > more > > > > > relevant, so I would like to propose a possible solution. Please if > > you > > > > > have any comment on this, do not hesitate to share it. > > > > > > > > > > The general approach would be to keep the current text box and > extend > > > the > > > > > server side capabilities to process search query in the similar > > manner > > > > for > > > > > example the Google search behaves.This extensions I would call > > > "filters". > > > > > For now I am interested in the ones I will mention below, but I > > think, > > > it > > > > > is only a matter of small work for further extend the solution with > > > > further > > > > > ones. > > > > > > > > > > In order to distinguish the filters from the rest of the search > > query, > > > I > > > > > propose to put them at the beginning of the query and use the > > > > > [a-zA-Z0-9\.]{1..n}\:[a-zA-Z0-9\.]{1..n} format. For example a > filter > > > > might > > > > > look the following: lorem:ipsum > > > > > > > > > > Adding this, the search query should look like the following: > > > > > > > > > > filter1:value filter2:value rest of the query > > > > > > > > > > As for processing the filters, I suggest the following behaviour: > > > > > > > > > > - Without filters the current behaviour should be kept > > > > > - Everything after the filters should be handled as the search term > > > > > - After the first "non filter word", anything should be considered > as > > > > part > > > > > of the search term (meaning: to keep the text parsing simple, I > would > > > not > > > > > go in the direction to support filters at the end of the query, > etc.) > > > > > - The ordering of the filters should have no effect on the result > > > > > - Filter duplications should be eliminated > > > > > - In case a filter appears multiple times in the query, the first > > > > occasion > > > > > will be used > > > > > - Unknown filters should be ignored > > > > > - Only adding filters will not end up with result, at least one > > > character > > > > > must appear as search term > > > > > > > > > > Suggested filters: > > > > > > > > > > scope > > > > > Narrows the search based on the user's currently active process > > group. > > > > The > > > > > allowed values are: "all" and "here". All produces the current > > > behaviour, > > > > > thus no filtering happens, but "here" should use the current > process > > > > group > > > > > as "root" of the search, ignoring everything else (including parent > > > > group). > > > > > Note: This needs a minimal frontend change, because as I did see, > > > > currently > > > > > the current group is not sent with the search query. > > > > > > > > > > group > > > > > Narrows the search for a given processing group, if it exists. The > > > > > behaviour is recursive, thus the result will include the contained > > > groups > > > > > as well. If it is a non-existing group, the result list should be > > > empty. > > > > > > > > > > properties > > > > > Controls if properties values are included or not. If not provided, > > the > > > > > property values will be included. This is because in a lot of cases > > > there > > > > > is a huge number of results come from property names. > > > > > > > > > > - Valid values for inclusion: yes, true, include, 1 > > > > > - Valid values for exclusion: no, none, false, exclude, 0 > > > > > > > > > > It is possible that the range of possible values should be limited > > (and > > > > not > > > > > being ambiguous), but I see a merit of "permissiveness" here as it > is > > > > > simpler to remember. > > > > > > > > > > Also some example: > > > > > > > > > > 1. > > > > > scope:here properties:exclude lorem ipsum > > > > > This should search only in the current group (and it's children), > > > > excluding > > > > > properties and return with components containing the "lorem ipsum" > > > > > expression. > > > > > > > > > > 2. > > > > > group:myGroup someQuery > > > > > This should result the finding of components with someQuery > > expression, > > > > but > > > > > only within the myGroup group, even if it is not the active one. > > > > > > > > > > 3. > > > > > scope:all properties:include lorem > > > > > This should behave the same as "lorem" without filters. > > > > > > > > > > Thanks for reading, I am interested to hear your opinion! > > > > > > > > > > Kind regards, > > > > > Bence > > > > > > > > > > > > > > >
