[ 
https://issues.apache.org/jira/browse/LUCENE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575864#comment-13575864
 ] 

Mark Harwood commented on LUCENE-4768:
--------------------------------------

OK - this problem seems to be about an ill-defined user query ("Saturn sky blue 
Sedan" with no explicit fields) being executed against a well-defined schema 
(cars with manufacturers, model names and bodyStyles that also have trims with 
colours).

If that's the case you have a heap of problems here which aren't necessarily 
related to the "block join" implementation. One example - IDF ranking being 
what it is, if a manufacturer like Ford create a model called the "Blue" or you 
have bad data entry that has an example of this value stored in the wrong field 
then Lucene will naturally rank model:blue higher than color:blue because of 
the scarcity of the token "blue" in that field context. That's almost the 
inverse of what you want.

A couple of suggestions for "field-less" queries like your example of "Saturn 
sky blue sedan"
1) Target the query on an unstructured "onebox" field that holds indexed 
content from all fields to achieve a more balanced IDF score.
2) Tokenize each item in the query string and find a "most likely" field for 
each search term by examining doc frequencies e.g. color:blue vs modelName:blue 
etc. Augment the "onebox" query in 1) with the most-likely-field interpretation 
for each word in the query string if it has sufficient doc frequency.





                
> Child Traversable To Parent Block Join Query
> --------------------------------------------
>
>                 Key: LUCENE-4768
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4768
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>         Environment: trunk
> git rev-parse HEAD
> 5cc88eaa41eb66236a0d4203cc81f1eed97c9a41
>            Reporter: Vadim Kirilchuk
>         Attachments: LUCENE-4768-draft.patch
>
>
> Hi everyone!
> Let me describe what i am trying to do:
> I have hierarchical documents ('car model' as parent, 'trim' as child) and 
> use block join queries to retrieve them. However, i am not happy with current 
> behavior of ToParentBlockJoinQuery which goes through all parent childs 
> during nextDoc call (accumulating scores and freqs).
> Consider the following example, you have a query with a custom post condition 
> on top of such bjq: and during post condition you traverse scorers tree 
> (doc-at-time) and want to manually push child scorers of bjq one by one until 
> condition passes or current parent have no more childs.
> I am attaching the patch with query(and some tests) similar to 
> ToParentBlockJoin but with an ability to traverse childs. (i have to do weird 
> instance of check and cast inside my code) This is a draft only and i will be 
> glad to hear if someone need it or to hear how we can improve it. 
> P.s i believe that proposed query is more generic (low level) than 
> ToParentBJQ and ToParentBJQ can be extended from it and call nextChild() 
> internally during nextDoc().
> Also, i think that the problem of traversing hierarchical documents is more 
> complex as lucene have only nextDoc API. What do you think about making api 
> more hierarchy aware? One level document is a special case of multi level 
> document but not vice versa. WDYT?
> Thanks in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to