[ 
https://issues.apache.org/jira/browse/JENA-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922239#comment-16922239
 ] 

Brian McBride edited comment on JENA-1749 at 9/4/19 7:49 AM:
-------------------------------------------------------------

[[

I'm in email contact and can discuss.

]]

That's great.  Thank you.  If you are on vacation, this can wait.  Right now I 
think there is a technical solution, but I am also conscious I may not have 
understood something important.

[[

The _functionality_ has been clearly documented since 3.6.0 as _not supported_. 

]]

I understand and respect that position.  Mine is a bit different though I think 
there is a way of looking at this that would reconcile our positions.  Whilst 
that would be an interesting discussion that appeals to my philosophical 
nature, I should follow your lead and see if we can figure out an actual 
working solution.

[[

The use of {{text:withFields}} is an approach that came to mind.

]]

I had a variation on the same idea.

[[

I appreciate your opinion. Otoh, the model for the integration of Jena w/ 
Lucene is _one triple == one document_ and the use of {{text:withFields}} is 
one way of unambiguously indicating that the query is based on a different 
model and the complexities that arise can be dealt with in a clear manner.

]]

You are right - I was expressing a value judgement and yours may differ.  In 
practical terms, if the property name were to change for the behaviour we use 
that will affect up to 5 applications and libraries (it may just be one library 
but I can't be sure of that without checking) and more significantly - we have 
a public sparql endpoint so it will affect all our users who use text queries.  
Now that is my problem - I just would like to be clear I'm not being capricious 
here.

[[

Do you have another approach to dealing with the inherent ambiguity that your 
use case exploits in:
{code:java}
?s text:query ( "query with fields" LUCENE_LIMIT )
{code}
]]

I'm not sure what you mean by "inherent ambiguity" so I may be missing 
something obvious. 

The query itself is unambiguous.  If it does not specify a field then lucene 
will search the default field.

In terms of the result, there is no ambiguity - there is only one subject for 
each document.

If you are considering the case of the problematic query form:

(?s ?score ?lit) text:query ( "query with or without fields" LUCENE_LIMIT )

then there are two cases:

1) the normal case - the lucene hit has only one field - so use that to return 
the ?lit value (which I guess is what happens in 3.12.0)

2) the multi-field index case - the lucene hit more than one field so don't 
bind ?lit and possibly throw an exception

I recognise I am probably missing something - maybe this will help identify it.

Another way in would be to ask - why has it changed since 3.12.0?  Is there 
something about OR that required this change?

Another thought:  if its the case that the code needs to know that it is 
dealing with a multifield index, then a config parameter could be used to tell 
it rather than a property name.

 


was (Author: bwm):
[[

I'm in email contact and can discuss.

]]

That's great.  Thank you.  If you are on vacation, this can wait.  Right now I 
think there is a technical solution, but I am also conscious I may not have 
understood something important.

[[

The _functionality_ has been clearly documented since 3.6.0 as _not supported_. 

]]

I understand and respect that position.  Mine is a bit different though I think 
there is a way of looking at this that would reconcile our positions.  Whilst 
that would be an interesting discussion that appeals to my philosophical 
nature, I should follow your lead and see if we can figure out an actual 
working solution.

[[

The use of {{text:withFields}} is an approach that came to mind.

]]

I had a variation on the same idea.

[[

I appreciate your opinion. Otoh, the model for the integration of Jena w/ 
Lucene is _one triple == one document_ and the use of {{text:withFields}} is 
one way of unambiguously indicating that the query is based on a different 
model and the complexities that arise can be dealt with in a clear manner.

]]

You are right - I was expressing a value judgement and yours may differ.  In 
practical terms, if the property name were to change for the behaviour we use 
that will affect up to 5 applications and libraries (it may just be one library 
but I can't be sure of that without checking) and more significantly - we have 
a public sparql endpoint so it will affect all our users who use text queries.  
Now that is my problem - I just would like to be clear I'm not being capricious 
here.

[[

Do you have another approach to dealing with the inherent ambiguity that your 
use case exploits in:
{code:java}
?s text:query ( "query with fields" LUCENE_LIMIT )
{code}
]]

I'm not sure what you mean by "inherent ambiguity" so I may be missing 
something obvious. 

The query itself is unambiguous.  If it does not specify a field then lucene 
will search the default field.

In terms of the result, there is no ambiguity - there is only one subject for 
each document.

If you are considering the case of the problematic query form:

(?s ?score ?lit) text:query ( "query with or without fields" LUCENE_LIMIT )

then there are two cases:

1) the normal case - the lucene hit has only one field - so use that to return 
the ?lit value (which I guess is what happens in 3.12.0)

2) the multi-field index case - the lucene hit more than one field so don't 
bind ?lit and possibly throw an exception

I recognise I am probably missing something - maybe this will help identify it.

Another way in would be to ask - why has it changed since 3.12.0?  Is there 
something about OR that required this change?

 

 

> Support lucene field names in jena text queries
> -----------------------------------------------
>
>                 Key: JENA-1749
>                 URL: https://issues.apache.org/jira/browse/JENA-1749
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.13.0
>            Reporter: Brian McBride
>            Priority: Major
>         Attachments: stacktrace.txt
>
>
> Until recent changes made during implementation of JENA-1723, it was possible 
> to have a Lucene text query that used Lucene field names.  With the 
> implementation of JENA-1723 such queries result in a exception
> For example:
> {quote}PREFIX  xsd:  
> [<http://www.w3.org/2001/XMLSchema#>|http://www.w3.org/2001/XMLSchema#] 
> PREFIX  text: [<http://jena.apache.org/text#>|http://jena.apache.org/text#] 
> PREFIX  ppd:  
> [<http://landregistry.data.gov.uk/def/ppi/>|http://landregistry.data.gov.uk/def/ppi/]
>  
> PREFIX  lrcommon: 
> [<http://landregistry.data.gov.uk/def/common/>|http://landregistry.data.gov.uk/def/common/]
>   
>  {{SELECT *  {}}
>   ?ppd_propertyAddress             
>       text:query            ( "street:  the" 3000000 ) .    
>  {{} LIMIT 1}}
> Cannot parse 'text:street: the ': Encountered " ":" ": "" at line 1, column 
> 11.
> {quote}
> This is a simplifed query from a running production system that works in 
> 3.12.0 but is failing in 3.13.0-SNAPSHOT.
> Some discussion and analysis of this issue has occurred in email:
> [https://lists.apache.org/thread.html/ccc1d5c5eaebcddafc2dbae85f3b5901396e3ab203df6bb4014e8270@%3Cusers.jena.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to