Re: [Neo4j] help in san francisco ? full-text index

gg4u Thu, 20 Aug 2015 01:41:38 -0700

...Well the positive thing is that you confirm it :) 

I post another thread: I will destroy indexes and need to recreate, could 
you please debrief a not java user (or appoint where to put the hands on) :


https://groups.google.com/d/msg/neo4j/30YIeZ50P_w/bjSExwWTBAAJ

It would be great that, if you have a driver in python or other languages, 
those could allow to implement all the native api, too. Or some tutorial 
more fore beginners appointing which files to modify and how at least for 
these lucene indexes, which is a most wanted feature - i guess - and still 
kinda confusing (legacy VS schema, fullly supporting but then not 
completely ..)

thank you!


Il giorno domenica 16 agosto 2015 12:15:44 UTC+2, Michael Hunger ha scritto:
>
> I looked into it again and there's sth in the impl which returns the same 
> lucene score for each document found so odering is not possible
>
> I tried to figure it out but so far no luck
>
>
> Von meinem iPhone gesendet
>
> Am 14.08.2015 um 13:54 schrieb gg4u <[email protected] <javascript:>>:
>
> that's so unfortunate I saw this message only now... awww.... I tried to 
> contact neo for a meeting at their headquarter but it didn't work out, I 
> would have definitely come. I had spoken with biz dev department, and 
> likely they had been keen to meet up if there would have been a biz 
> development opportunity with an already promising startup, rather than 
> helping a single dude to turn a prototype into a beta :)
>
> So I resolved to dedicate my little time in SF (you know, ESTA touristic 
> visa) not to code and develop, rather to get feedback on a first product I 
> crafted (not yet supporting a graph db) and to meet with people and gather 
> info for moving back there .. lovely and intense city.
>
> .. now in Italy, I am working back on this aspect of full-text indexing, 
> and yes I did follow your link but I can't get meaningful results with 
> full-text.
>
> I am finding it very difficult aspect: in the documentation there are 
> examples but it s not clear which also works on cypher and which not (e.g. 
> http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher
> ).
> Other tutorial shed some light in java
>
> http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/
> but still did not manage - I see there is some support in 
> neo4j-rest-client for python for supporting lucene syntax,  but my concern 
> is about the quality of hit results and need a better understanding on how 
> results are matched in a START query in cypher - or do a full-text query in 
> python.
>
> I also posted in stack-overflow:
>
> http://stackoverflow.com/questions/31862761/how-to-handle-thousands-of-rows-with-similar-string-patterns-in-neo4j-full-text
>
> *Issue: *full-text index in cypher START look of poor quality: examples 
> using topics name from wikipedia as test:
>
>    - 'DNA' won't hit a record with the single word 'DNA' as first results
>    - 'united states' (note, two words) will find the actual record 
>    'united states' buried somewhere deep down a list of 11K rows; first one 
>    match is a long name where there are words 'united states' too, but that's 
>    not meaningful (e.g. 'List of something here and there that happened in 
>    united states some while ago' :)
>    
>
> That is a *major problem*,  because you* cannot paginate the results with 
> a logical sense*: in order to return meaningful results to the user, you 
> need to fetch all nodes and then apply some sorting (levenshtein as 
> example).
>
> It is not clear how the results are sorted in lucene/neo, although, 
> reading the neo4j documentation and here:
>
> http://stackoverflow.com/questions/10140885/sorting-neo4js-lucene-index-queries-in-cypher
> it looks that lucene should handle the sorting itself: which logic? 
> Results above seems to be kinda randomly hit.
>
> To clarify:
> I read about levenshtein metrics in lucene documentation, and for sure 
> results fetch by a simple:
> *start n=node:topic('name:(DNA)') return n skip 0 limit 10;*
> do not use levenshtein in my case.
>
> As test, I tried *fuzzy searches*:
> *start n=node:topic('name:DNA~0.4') return n skip 0 limit 10;*
> and results do not change. 
>
> As a second test, I can see that:
> *http://localhost:7474/db/data/index/node/topic?query=name:%22dna%22 
> <http://localhost:7474/db/data/index/node/topic?query=name:%22dna%22>*
> return results as in the cypher query before,
> *but*
> syntax :* http://localhost:7474/db/data/index/node/topic/dna 
> <http://localhost:7474/db/data/index/node/topic/dna>*
> returns
> *No index hits*
>
> I am trying to understand if I am writing wrong the syntax, either there 
> is an error - maybe still is related to a batch import (you once told me 
> there was a n error in indexing)? but it would not make much sense, since 
> START actually find results which are pertinent (keywords values are inside 
> the 'name' key), but not meaningul (values are 'distant' from the input 
> keywords).
>
>
> My goal is to find a best match of first results; I am able to return 
> decent results by applying levenshtein by my self *after* results are 
> returned by a START query, but that, as said, is not feasible for matches 
> with thousands of rows.
>   
>
> *Could you please provide a brief guide or mockup db to test if my indexes 
> are maybe corrupted, or a benchmark / guide to test how lucene START 
> matches and sort results, possibly in python - not java -?*
>
>
>
> Il giorno martedì 12 maggio 2015 11:06:02 UTC+2, Michael Hunger ha scritto:
>>
>> Hi Luigi,
>>
>> did you try the fulltext lucene index approach that we discussed back 
>> then? Can you share your latest approach?
>> I presume you do regexp search which is not using an index?
>>
>> I wrote that blog post a while ago, which is still valid for 2.2 
>> http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/ 
>> <http://www.google.com/url?q=http%3A%2F%2Fjexp.de%2Fblog%2F2014%2F03%2Ffull-text-indexing-fts-in-neo4j-2-0%2F&sa=D&sntz=1&usg=AFQjCNEBkKKnFpeuUlUr-vNAlw0Cx8DIdQ>
>>
>> For Neo4j 2.3 there should be a automatic solution for this issue coming 
>> up using LIKE.
>>
>> If you have time next week you can probably drop by the office to say hi.
>>
>> Michael
>>
>>
>> Am 12.05.2015 um 01:00 schrieb gg4u <[email protected]>:
>>
>> hi folks,
>> i am temporarily in san francisco.
>>
>> I have my db with simple node-titles as names.
>>
>> I would like to open for a crowd-funding campaign.
>>
>> I need a final step.
>> Full-text search is quite lame.. It's ok if you search for the exact node 
>> name.
>> But it takes several seconds to match a substring in the name.
>>
>> I thought, maybe there's a plugin to import in elastic search or alike.
>> River was there, but it is dismissed.
>>
>> Is there anybody which maybe willing to give an help for an importer from 
>> neo to Elastic Search or to solve full-text indexing?
>> I could offer a dinner in exchange or.. well.. team up and share what 
>> I've done if you like my project!
>>
>> a mockup is here:
>> www.xdiscovery.com/en/graph/22
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] help in san francisco ? full-text index

Reply via email to