[ 
https://issues.apache.org/jira/browse/JENA-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922239#comment-16922239
 ] 

Andy Seaborne edited comment on JENA-1749 at 9/23/19 9:59 AM:
--------------------------------------------------------------

{quote}I'm in email contact and can discuss.
{quote}
That's great.  Thank you.  If you are on vacation, this can wait.  Right now I 
think there is a technical solution, but I am also conscious I may not have 
understood something important.
{quote}The _functionality_ has been clearly documented since 3.6.0 as _not 
supported_. 
{quote}
I understand and respect that position.  Mine is a bit different though I think 
there is a way of looking at this that would reconcile our positions.  Whilst 
that would be an interesting discussion that appeals to my philosophical 
nature, I should follow your lead and see if we can figure out an actual 
working solution.
{quote}The use of {{text:withFields}} is an approach that came to mind.
{quote}
 

I had a variation on the same idea.
{quote}I appreciate your opinion. Otoh, the model for the integration of Jena 
w/ Lucene is _one triple == one document_ and the use of {{text:withFields}} is 
one way of unambiguously indicating that the query is based on a different 
model and the complexities that arise can be dealt with in a clear manner.
{quote}
You are right - I was expressing a value judgement and yours may differ.  In 
practical terms, if the property name were to change for the behaviour we use 
that will affect up to 5 applications and libraries (it may just be one library 
but I can't be sure of that without checking) and more significantly - we have 
a public sparql endpoint so it will affect all our users who use text queries.  
Now that is my problem - I just would like to be clear I'm not being capricious 
here.
{quote}Do you have another approach to dealing with the inherent ambiguity that 
your use case exploits in:

{color:#172b4d}?s text:query ( {color}"query with fields"{color:#172b4d} 
LUCENE_LIMIT ){color}
{quote}
I'm not sure what you mean by "inherent ambiguity" so I may be missing 
something obvious. 

The query itself is unambiguous.  If it does not specify a field then lucene 
will search the default field.

In terms of the result, there is no ambiguity - there is only one subject for 
each document.

If you are considering the case of the problematic query form:

(?s ?score ?lit) text:query ( "query with or without fields" LUCENE_LIMIT )

then there are two cases:

1) the normal case - the lucene hit has only one field - so use that to return 
the ?lit value (which I guess is what happens in 3.12.0)

2) the multi-field index case - the lucene hit more than one field so don't 
bind ?lit and possibly throw an exception

I recognise I am probably missing something - maybe this will help identify it.

Another way in would be to ask - why has it changed since 3.12.0?  Is there 
something about OR that required this change?

Another thought:  if its the case that the code needs to know that it is 
dealing with a multifield index, then a config parameter could be used to tell 
it rather than a property name.

 


was (Author: bwm):
[[

I'm in email contact and can discuss.

]]

That's great.  Thank you.  If you are on vacation, this can wait.  Right now I 
think there is a technical solution, but I am also conscious I may not have 
understood something important.

[[

The _functionality_ has been clearly documented since 3.6.0 as _not supported_. 

]]

I understand and respect that position.  Mine is a bit different though I think 
there is a way of looking at this that would reconcile our positions.  Whilst 
that would be an interesting discussion that appeals to my philosophical 
nature, I should follow your lead and see if we can figure out an actual 
working solution.

[[

The use of {{text:withFields}} is an approach that came to mind.

]]

I had a variation on the same idea.

[[

I appreciate your opinion. Otoh, the model for the integration of Jena w/ 
Lucene is _one triple == one document_ and the use of {{text:withFields}} is 
one way of unambiguously indicating that the query is based on a different 
model and the complexities that arise can be dealt with in a clear manner.

]]

You are right - I was expressing a value judgement and yours may differ.  In 
practical terms, if the property name were to change for the behaviour we use 
that will affect up to 5 applications and libraries (it may just be one library 
but I can't be sure of that without checking) and more significantly - we have 
a public sparql endpoint so it will affect all our users who use text queries.  
Now that is my problem - I just would like to be clear I'm not being capricious 
here.

[[

Do you have another approach to dealing with the inherent ambiguity that your 
use case exploits in:
{code:java}
?s text:query ( "query with fields" LUCENE_LIMIT )
{code}
]]

I'm not sure what you mean by "inherent ambiguity" so I may be missing 
something obvious. 

The query itself is unambiguous.  If it does not specify a field then lucene 
will search the default field.

In terms of the result, there is no ambiguity - there is only one subject for 
each document.

If you are considering the case of the problematic query form:

(?s ?score ?lit) text:query ( "query with or without fields" LUCENE_LIMIT )

then there are two cases:

1) the normal case - the lucene hit has only one field - so use that to return 
the ?lit value (which I guess is what happens in 3.12.0)

2) the multi-field index case - the lucene hit more than one field so don't 
bind ?lit and possibly throw an exception

I recognise I am probably missing something - maybe this will help identify it.

Another way in would be to ask - why has it changed since 3.12.0?  Is there 
something about OR that required this change?

Another thought:  if its the case that the code needs to know that it is 
dealing with a multifield index, then a config parameter could be used to tell 
it rather than a property name.

 

> Support lucene field names in jena text queries
> -----------------------------------------------
>
>                 Key: JENA-1749
>                 URL: https://issues.apache.org/jira/browse/JENA-1749
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.13.0
>            Reporter: Brian McBride
>            Priority: Major
>         Attachments: stacktrace.txt
>
>
> Until recent changes made during implementation of JENA-1723, it was possible 
> to have a Lucene text query that used Lucene field names.  With the 
> implementation of JENA-1723 such queries result in a exception
> For example:
> {noformat}
> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
> PREFIX  text: <http://jena.apache.org/text#>
> PREFIX  ppd:  <http://landregistry.data.gov.uk/def/ppi/>
> PREFIX  lrcommon: <http://landregistry.data.gov.uk/def/common/>
> SELECT * 
> {   ?ppd_propertyAddress  
>           text:query            ( "street:  the" 3000000 ) .  
> } LIMIT 1
> Cannot parse 'text:street: the ': Encountered " ":" ": "" at line 1, column 
> 11.
> {noformat}
> This is a simplified query from a running production system that works in 
> 3.12.0 but is failing in 3.13.0-SNAPSHOT.
> Some discussion and analysis of this issue has occurred in email:
> [https://lists.apache.org/thread.html/ccc1d5c5eaebcddafc2dbae85f3b5901396e3ab203df6bb4014e8270@%3Cusers.jena.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to