Re: [Neo4j] help in san francisco ? full-text index

gg4u Fri, 14 Aug 2015 04:56:09 -0700

that's so unfortunate I saw this message only now... awww.... I tried to 
contact neo for a meeting at their headquarter but it didn't work out, I 
would have definitely come. I had spoken with biz dev department, and 
likely they had been keen to meet up if there would have been a biz 
development opportunity with an already promising startup, rather than 
helping a single dude to turn a prototype into a beta :)


So I resolved to dedicate my little time in SF (you know, ESTA touristic 
visa) not to code and develop, rather to get feedback on a first product I 
crafted (not yet supporting a graph db) and to meet with people and gather 
info for moving back there .. lovely and intense city.

.. now in Italy, I am working back on this aspect of full-text indexing, 
and yes I did follow your link but I can't get meaningful results with 
full-text.

I am finding it very difficult aspect: in the documentation there are 
examples but it s not clear which also works on cypher and which not (e.g. 
http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher).
Other tutorial shed some light in java
http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/
but still did not manage - I see there is some support in neo4j-rest-client 
for python for supporting lucene syntax,  but my concern is about the 
quality of hit results and need a better understanding on how results are 
matched in a START query in cypher - or do a full-text query in python.

I also posted in stack-overflow:
http://stackoverflow.com/questions/31862761/how-to-handle-thousands-of-rows-with-similar-string-patterns-in-neo4j-full-text

*Issue: *full-text index in cypher START look of poor quality: examples 
using topics name from wikipedia as test:

   - 'DNA' won't hit a record with the single word 'DNA' as first results
   - 'united states' (note, two words) will find the actual record 'united 
   states' buried somewhere deep down a list of 11K rows; first one match is a 
   long name where there are words 'united states' too, but that's not 
   meaningful (e.g. 'List of something here and there that happened in united 
   states some while ago' :)
   

That is a *major problem*,  because you* cannot paginate the results with a 
logical sense*: in order to return meaningful results to the user, you need 
to fetch all nodes and then apply some sorting (levenshtein as example).

It is not clear how the results are sorted in lucene/neo, although, reading 
the neo4j documentation and here:
http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher
it looks that lucene should handle the sorting itself: which logic? Results 
above seems to be kinda randomly hit.

To clarify:
I read about levenshtein metrics in lucene documentation, and for sure 
results fetch by a simple:
*start n=node:topic('name:(DNA)') return n skip 0 limit 10;*
do not use levenshtein in my case.

As test, I tried *fuzzy searches*:
*start n=node:topic('name:DNA~0.4') return n skip 0 limit 10;*
and results do not change. 

As a second test, I can see that:
*http://localhost:7474/db/data/index/node/topic?query=name:%22dna%22*
return results as in the cypher query before,
*but*
syntax :* http://localhost:7474/db/data/index/node/topic/dna*
returns
*No index hits*

I am trying to understand if I am writing wrong the syntax, either there is 
an error - maybe still is related to a batch import (you once told me there 
was a n error in indexing)? but it would not make much sense, since START 
actually find results which are pertinent (keywords values are inside the 
'name' key), but not meaningul (values are 'distant' from the input 
keywords).


My goal is to find a best match of first results; I am able to return 
decent results by applying levenshtein by my self *after* results are 
returned by a START query, but that, as said, is not feasible for matches 
with thousands of rows.
  

*Could you please provide a brief guide or mockup db to test if my indexes 
are maybe corrupted, or a benchmark / guide to test how lucene START 
matches and sort results, possibly in python - not java -?*



Il giorno martedì 12 maggio 2015 11:06:02 UTC+2, Michael Hunger ha scritto:
>
> Hi Luigi,
>
> did you try the fulltext lucene index approach that we discussed back 
> then? Can you share your latest approach?
> I presume you do regexp search which is not using an index?
>
> I wrote that blog post a while ago, which is still valid for 2.2 
> http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/ 
> <http://www.google.com/url?q=http%3A%2F%2Fjexp.de%2Fblog%2F2014%2F03%2Ffull-text-indexing-fts-in-neo4j-2-0%2F&sa=D&sntz=1&usg=AFQjCNEBkKKnFpeuUlUr-vNAlw0Cx8DIdQ>
>
> For Neo4j 2.3 there should be a automatic solution for this issue coming 
> up using LIKE.
>
> If you have time next week you can probably drop by the office to say hi.
>
> Michael
>
>
> Am 12.05.2015 um 01:00 schrieb gg4u <[email protected] <javascript:>>:
>
> hi folks,
> i am temporarily in san francisco.
>
> I have my db with simple node-titles as names.
>
> I would like to open for a crowd-funding campaign.
>
> I need a final step.
> Full-text search is quite lame.. It's ok if you search for the exact node 
> name.
> But it takes several seconds to match a substring in the name.
>
> I thought, maybe there's a plugin to import in elastic search or alike.
> River was there, but it is dismissed.
>
> Is there anybody which maybe willing to give an help for an importer from 
> neo to Elastic Search or to solve full-text indexing?
> I could offer a dinner in exchange or.. well.. team up and share what I've 
> done if you like my project!
>
> a mockup is here:
> www.xdiscovery.com/en/graph/22
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] help in san francisco ? full-text index

Reply via email to