I looked into it again and there's sth in the impl which returns the same lucene score for each document found so odering is not possible
I tried to figure it out but so far no luck Von meinem iPhone gesendet > Am 14.08.2015 um 13:54 schrieb gg4u <[email protected]>: > > that's so unfortunate I saw this message only now... awww.... I tried to > contact neo for a meeting at their headquarter but it didn't work out, I > would have definitely come. I had spoken with biz dev department, and likely > they had been keen to meet up if there would have been a biz development > opportunity with an already promising startup, rather than helping a single > dude to turn a prototype into a beta :) > > So I resolved to dedicate my little time in SF (you know, ESTA touristic > visa) not to code and develop, rather to get feedback on a first product I > crafted (not yet supporting a graph db) and to meet with people and gather > info for moving back there .. lovely and intense city. > > .. now in Italy, I am working back on this aspect of full-text indexing, and > yes I did follow your link but I can't get meaningful results with full-text. > > I am finding it very difficult aspect: in the documentation there are > examples but it s not clear which also works on cypher and which not (e.g. > http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher). > Other tutorial shed some light in java > http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/ > but still did not manage - I see there is some support in neo4j-rest-client > for python for supporting lucene syntax, but my concern is about the quality > of hit results and need a better understanding on how results are matched in > a START query in cypher - or do a full-text query in python. > > I also posted in stack-overflow: > http://stackoverflow.com/questions/31862761/how-to-handle-thousands-of-rows-with-similar-string-patterns-in-neo4j-full-text > > Issue: full-text index in cypher START look of poor quality: examples using > topics name from wikipedia as test: > 'DNA' won't hit a record with the single word 'DNA' as first results > 'united states' (note, two words) will find the actual record 'united states' > buried somewhere deep down a list of 11K rows; first one match is a long name > where there are words 'united states' too, but that's not meaningful (e.g. > 'List of something here and there that happened in united states some while > ago' :) > > That is a major problem, because you cannot paginate the results with a > logical sense: in order to return meaningful results to the user, you need to > fetch all nodes and then apply some sorting (levenshtein as example). > > It is not clear how the results are sorted in lucene/neo, although, reading > the neo4j documentation and here: > http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher > it looks that lucene should handle the sorting itself: which logic? Results > above seems to be kinda randomly hit. > > To clarify: > I read about levenshtein metrics in lucene documentation, and for sure > results fetch by a simple: > start n=node:topic('name:(DNA)') return n skip 0 limit 10; > do not use levenshtein in my case. > > As test, I tried fuzzy searches: > start n=node:topic('name:DNA~0.4') return n skip 0 limit 10; > and results do not change. > > As a second test, I can see that: > http://localhost:7474/db/data/index/node/topic?query=name:%22dna%22 > return results as in the cypher query before, > but > syntax : http://localhost:7474/db/data/index/node/topic/dna > returns > No index hits > > I am trying to understand if I am writing wrong the syntax, either there is > an error - maybe still is related to a batch import (you once told me there > was a n error in indexing)? but it would not make much sense, since START > actually find results which are pertinent (keywords values are inside the > 'name' key), but not meaningul (values are 'distant' from the input keywords). > > > My goal is to find a best match of first results; I am able to return decent > results by applying levenshtein by my self after results are returned by a > START query, but that, as said, is not feasible for matches with thousands of > rows. > > > Could you please provide a brief guide or mockup db to test if my indexes are > maybe corrupted, or a benchmark / guide to test how lucene START matches and > sort results, possibly in python - not java -? > > > > Il giorno martedì 12 maggio 2015 11:06:02 UTC+2, Michael Hunger ha scritto: >> >> Hi Luigi, >> >> did you try the fulltext lucene index approach that we discussed back then? >> Can you share your latest approach? >> I presume you do regexp search which is not using an index? >> >> I wrote that blog post a while ago, which is still valid for 2.2 >> http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/ >> >> For Neo4j 2.3 there should be a automatic solution for this issue coming up >> using LIKE. >> >> If you have time next week you can probably drop by the office to say hi. >> >> Michael >> >> >>> Am 12.05.2015 um 01:00 schrieb gg4u <[email protected]>: >>> >>> hi folks, >>> i am temporarily in san francisco. >>> >>> I have my db with simple node-titles as names. >>> >>> I would like to open for a crowd-funding campaign. >>> >>> I need a final step. >>> Full-text search is quite lame.. It's ok if you search for the exact node >>> name. >>> But it takes several seconds to match a substring in the name. >>> >>> I thought, maybe there's a plugin to import in elastic search or alike. >>> River was there, but it is dismissed. >>> >>> Is there anybody which maybe willing to give an help for an importer from >>> neo to Elastic Search or to solve full-text indexing? >>> I could offer a dinner in exchange or.. well.. team up and share what I've >>> done if you like my project! >>> >>> a mockup is here: >>> www.xdiscovery.com/en/graph/22 >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
