sorry, the first problem is not mine. On Fri, Jun 1, 2012 at 4:58 PM, Tanguy Moal <tanguy.m...@gmail.com> wrote: > Hello, > > I'm just sharing my thoughts, they might be off-topic... > > Take the first example quoted from github : the user wants to find all nodes > having their facebookId in a given quite long list ( a friends list, be > aware that some facebook users have 1500+ friends!). > > The application firstly had the facebookId for a user (say id=someId), and > requested the facebook graph with that id and got a quite long list of > facebookIds back, right? > At that time, I think the application should not try to enumerate its neo4j > graph using a OR-ed facebookIds list. > It should make sure that each neo4j node in set of the friends list has a > "friendOf" attribute and ensure that this multivalued attribute contains the > facebookId : someId for each involved node. Trigger an update request of > those updated nodes. > You could make your application wait for that update to complete if it > really needs to be synchronous with facebook. > That moves the problem to handling update request smartly which might be > easier sometimes. > Here you will eventually want to store a hash the user's friendslist > somewhere in the user's node so you know in advance if that user's friends > list has changed and if you need to trigger the update process again (just > thinking). > When your user uses the application for the first time, or every time after > she updated her friends list, an update job will be fired for that user. You > may want to wait for update request to complete only the first time (if you > don't need your app to be 100% synchronized with facebook), and make the > subsequent jobs be queued to something handling these updates > efficiently. That could stress the storage system with intensive writes > from times to times, especially at the beginning but that will converge to a > mainly read-based application after most active user has used the > application once. New friendships aren't that frequent (IMHO). > May by NRT developments could be used in this scenario... I don't know much > more. I don't know anything about how Neo4J works, I used it once, that's > all. > Anyway if you hit writes issues, congratulations your application is being > used widely, go buy SSD disks :) > > Finally, you will then enumerate your nodes with a very quick and efficient > query friendOf:"someId" . > > > What I wanted to mean is that if your application really needs to perform > queries made of many, many, many, ... really many terms that are OR-ed, then > there might exist (but it's not always true) a different design of your data > model that could allow you to still fit the use case of a search engine.
I agree. Lucene/solr may need support many other types of query used in traditional database. for now, we usually store structured data in rdbms and full text in lucene/solr. But the synchronization of data is a nightmare. we like just use one full featured solution instead of integrating many solutions. > > This applies to 1 and may be to 2 too. ( :p 2-2-2 -- never mind ) > > I don't really understand for 3 which seems to be a MinShouldMatch issue. > > As I said in the beginning, I'm simply sharing my thoughts! I hope this > helps... > > -- > Tanguy > > 2012/6/1 Li Li <fancye...@gmail.com> >> >> hi all, >> I am looking for a 'BooleanMatcher' in lucene. for many >> application, we don't need order matched documents by relevant scores. >> we just like the boolean query. But the BooleanScorer/BooleanScorer2 >> is a little bit heavy for the purpose of relevant scoring. >> one use case is: we have some fields which has very small number >> of tokens(usually only one word). such as id,tag or something else. >> But we need query like this: id in (1,3,5.....). if using >> booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31 >> terms. BooleanScorer2 using priority queue to know how many terms are >> matched(Coord). >> Filters may help but it can be a very complicated query(or else, >> it self still using BooleanQuery, there is a recursive problem) >> >> we may divide current BooleanScorer to a BooleanMatcher and a >> Ranker. if we need score the hitted docs, we ask the BooleanScorer for >> not only hitted id but also tf/idf coord or anything we need to use in >> ranking. but sometimes we only need docIds. then the BooleanMatcher >> can optimize it's implementation. for the case of many disjunction >> terms, we can do it like Filter or BooleanScorer instead of >> BooleanScorer2. >> >> is it possible? >> >> following is some user demands I searched from the mail list. the >> first one is my own requirement. >> >> 1. https://github.com/neo4j/community/issues/494 >> >> 2. mail to lucene >> >> qibaoy...@126.com qibaoy...@126.com via lucene.apache.org >> >> May 6 >> >> to lucene >> Hi, >> I met a problem about how to search many keywords in about >> 5,000,000 documents.For example the query may be like "(a1 or a2 or a3 >> ....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take >> vey long time(40seconds) to get the the answer in only one field(Title >> field),and JVM will throw OutMemory error in more fields(title field >> plus content field).Any suggestions or good idea to solve this >> problem?thanks in advance. >> >> >> 3 mail to lucene >> Chris Book chrisb...@gmail.com via lucene.apache.org >> >> Apr 11 >> >> to solr-user >> Hello, I have a solr index running that is working very well as a search. >> But I want to add the ability (if possible) to use it to do matching. >> The >> problem is that by default it is only looking for all the input terms to >> be >> present, and it doesn't give me any indication as to how many terms in the >> target field were not specified by the input. >> >> For example, if I'm trying to match to the song title "dust in the wind", >> I'm correctly getting a match if the input query is "dust in wind". But I >> don't want to get a match if the input is just "dust". Although as a >> search "dust" should return this result, I'm looking for some way to >> filter >> this out based on some indication that the input isn't close enough to the >> output. Perhaps if I could get information that that the number of input >> terms is much less than the number of terms in the field. Or something >> else along those line? >> >> I realize that this isn't the typical use case for a search, but I'm just >> looking for some suggestions as to how I could improve the above example a >> bit. >> >> Thanks, >> Chris >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org