Re: [Neo4j] help in san francisco ? full-text index

Michael Hunger Sun, 16 Aug 2015 03:16:19 -0700

I looked into it again and there's sth in the impl which returns the same 
lucene score for each document found so odering is not possible


I tried to figure it out but so far no luck


Von meinem iPhone gesendet

> Am 14.08.2015 um 13:54 schrieb gg4u <[email protected]>:
> 
> that's so unfortunate I saw this message only now... awww.... I tried to 
> contact neo for a meeting at their headquarter but it didn't work out, I 
> would have definitely come. I had spoken with biz dev department, and likely 
> they had been keen to meet up if there would have been a biz development 
> opportunity with an already promising startup, rather than helping a single 
> dude to turn a prototype into a beta :)
> 
> So I resolved to dedicate my little time in SF (you know, ESTA touristic 
> visa) not to code and develop, rather to get feedback on a first product I 
> crafted (not yet supporting a graph db) and to meet with people and gather 
> info for moving back there .. lovely and intense city.
> 
> .. now in Italy, I am working back on this aspect of full-text indexing, and 
> yes I did follow your link but I can't get meaningful results with full-text.
> 
> I am finding it very difficult aspect: in the documentation there are 
> examples but it s not clear which also works on cypher and which not (e.g. 
> http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher).
> Other tutorial shed some light in java
> http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/
> but still did not manage - I see there is some support in neo4j-rest-client 
> for python for supporting lucene syntax,  but my concern is about the quality 
> of hit results and need a better understanding on how results are matched in 
> a START query in cypher - or do a full-text query in python.
> 
> I also posted in stack-overflow:
> http://stackoverflow.com/questions/31862761/how-to-handle-thousands-of-rows-with-similar-string-patterns-in-neo4j-full-text
> 
> Issue: full-text index in cypher START look of poor quality: examples using 
> topics name from wikipedia as test:
> 'DNA' won't hit a record with the single word 'DNA' as first results
> 'united states' (note, two words) will find the actual record 'united states' 
> buried somewhere deep down a list of 11K rows; first one match is a long name 
> where there are words 'united states' too, but that's not meaningful (e.g. 
> 'List of something here and there that happened in united states some while 
> ago' :)
> 
> That is a major problem,  because you cannot paginate the results with a 
> logical sense: in order to return meaningful results to the user, you need to 
> fetch all nodes and then apply some sorting (levenshtein as example).
> 
> It is not clear how the results are sorted in lucene/neo, although, reading 
> the neo4j documentation and here:
> http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher
> it looks that lucene should handle the sorting itself: which logic? Results 
> above seems to be kinda randomly hit.
> 
> To clarify:
> I read about levenshtein metrics in lucene documentation, and for sure 
> results fetch by a simple:
> start n=node:topic('name:(DNA)') return n skip 0 limit 10;
> do not use levenshtein in my case.
> 
> As test, I tried fuzzy searches:
> start n=node:topic('name:DNA~0.4') return n skip 0 limit 10;
> and results do not change. 
> 
> As a second test, I can see that:
> http://localhost:7474/db/data/index/node/topic?query=name:%22dna%22
> return results as in the cypher query before,
> but
> syntax : http://localhost:7474/db/data/index/node/topic/dna
> returns
> No index hits
> 
> I am trying to understand if I am writing wrong the syntax, either there is 
> an error - maybe still is related to a batch import (you once told me there 
> was a n error in indexing)? but it would not make much sense, since START 
> actually find results which are pertinent (keywords values are inside the 
> 'name' key), but not meaningul (values are 'distant' from the input keywords).
> 
> 
> My goal is to find a best match of first results; I am able to return decent 
> results by applying levenshtein by my self after results are returned by a 
> START query, but that, as said, is not feasible for matches with thousands of 
> rows.
>   
> 
> Could you please provide a brief guide or mockup db to test if my indexes are 
> maybe corrupted, or a benchmark / guide to test how lucene START matches and 
> sort results, possibly in python - not java -?
> 
> 
> 
> Il giorno martedì 12 maggio 2015 11:06:02 UTC+2, Michael Hunger ha scritto:
>> 
>> Hi Luigi,
>> 
>> did you try the fulltext lucene index approach that we discussed back then? 
>> Can you share your latest approach?
>> I presume you do regexp search which is not using an index?
>> 
>> I wrote that blog post a while ago, which is still valid for 2.2 
>> http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/
>> 
>> For Neo4j 2.3 there should be a automatic solution for this issue coming up 
>> using LIKE.
>> 
>> If you have time next week you can probably drop by the office to say hi.
>> 
>> Michael
>> 
>> 
>>> Am 12.05.2015 um 01:00 schrieb gg4u <[email protected]>:
>>> 
>>> hi folks,
>>> i am temporarily in san francisco.
>>> 
>>> I have my db with simple node-titles as names.
>>> 
>>> I would like to open for a crowd-funding campaign.
>>> 
>>> I need a final step.
>>> Full-text search is quite lame.. It's ok if you search for the exact node 
>>> name.
>>> But it takes several seconds to match a substring in the name.
>>> 
>>> I thought, maybe there's a plugin to import in elastic search or alike.
>>> River was there, but it is dismissed.
>>> 
>>> Is there anybody which maybe willing to give an help for an importer from 
>>> neo to Elastic Search or to solve full-text indexing?
>>> I could offer a dinner in exchange or.. well.. team up and share what I've 
>>> done if you like my project!
>>> 
>>> a mockup is here:
>>> www.xdiscovery.com/en/graph/22
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] help in san francisco ? full-text index

Reply via email to