[
https://issues.apache.org/jira/browse/LUCENE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575864#comment-13575864
]
Mark Harwood commented on LUCENE-4768:
--------------------------------------
OK - this problem seems to be about an ill-defined user query ("Saturn sky blue
Sedan" with no explicit fields) being executed against a well-defined schema
(cars with manufacturers, model names and bodyStyles that also have trims with
colours).
If that's the case you have a heap of problems here which aren't necessarily
related to the "block join" implementation. One example - IDF ranking being
what it is, if a manufacturer like Ford create a model called the "Blue" or you
have bad data entry that has an example of this value stored in the wrong field
then Lucene will naturally rank model:blue higher than color:blue because of
the scarcity of the token "blue" in that field context. That's almost the
inverse of what you want.
A couple of suggestions for "field-less" queries like your example of "Saturn
sky blue sedan"
1) Target the query on an unstructured "onebox" field that holds indexed
content from all fields to achieve a more balanced IDF score.
2) Tokenize each item in the query string and find a "most likely" field for
each search term by examining doc frequencies e.g. color:blue vs modelName:blue
etc. Augment the "onebox" query in 1) with the most-likely-field interpretation
for each word in the query string if it has sufficient doc frequency.
> Child Traversable To Parent Block Join Query
> --------------------------------------------
>
> Key: LUCENE-4768
> URL: https://issues.apache.org/jira/browse/LUCENE-4768
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/query/scoring
> Environment: trunk
> git rev-parse HEAD
> 5cc88eaa41eb66236a0d4203cc81f1eed97c9a41
> Reporter: Vadim Kirilchuk
> Attachments: LUCENE-4768-draft.patch
>
>
> Hi everyone!
> Let me describe what i am trying to do:
> I have hierarchical documents ('car model' as parent, 'trim' as child) and
> use block join queries to retrieve them. However, i am not happy with current
> behavior of ToParentBlockJoinQuery which goes through all parent childs
> during nextDoc call (accumulating scores and freqs).
> Consider the following example, you have a query with a custom post condition
> on top of such bjq: and during post condition you traverse scorers tree
> (doc-at-time) and want to manually push child scorers of bjq one by one until
> condition passes or current parent have no more childs.
> I am attaching the patch with query(and some tests) similar to
> ToParentBlockJoin but with an ability to traverse childs. (i have to do weird
> instance of check and cast inside my code) This is a draft only and i will be
> glad to hear if someone need it or to hear how we can improve it.
> P.s i believe that proposed query is more generic (low level) than
> ToParentBJQ and ToParentBJQ can be extended from it and call nextChild()
> internally during nextDoc().
> Also, i think that the problem of traversing hierarchical documents is more
> complex as lucene have only nextDoc API. What do you think about making api
> more hierarchy aware? One level document is a special case of multi level
> document but not vice versa. WDYT?
> Thanks in advance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]