Holger Comments inline:
On 8/2/13 3:50 PM, "Holger Knublauch" <[email protected]> wrote: >Folks, > >all I did was report a bug, and your response is to delete the whole >feature! This would completely throw out the baby with the bathwater and >may mean that we at TopQuadrant have to branch off to our own Jena >version when this happens. I started the discussion because what your bug highlights is that initial bindings as they are currently implemented are broken, we appear to disagree on how they are broken but clearly they are not entirely correct in their behavior one way or the other. I'm aiming to solicit opinions from all interested parties and try to get a consensus about how this feature should be handled in the future. Whether that means merely fixing the current implementation, replacing it with another solution or something else entirely. Note that I explicitly said that this is about re-architecting this feature in some future release, no-one is going into the code base and deleting this feature immediately. If the eventual decision is to remove this feature in its current form then there would be a deprecation cycle. Forcing TQ to fork Jena is not my intention, I'm trying to find an appropriate means to address something which I consider to be broken in a way that benefits the whole community. Note that there are other places where initial bindings are completely unimplemented/broken depending on your POV e.g. remote queries. > >I would be in favor of this option > >5) Keep the API as it is and restore initial bindings for UPDATE as well. Side Note - Initial bindings for updates was removed because it was a barrier to streaming updates (http://markmail.org/message/bazwh2exmcc5vmoh). Also as others noted in the discussion there initial bindings is a little murkier for updates since does it apply only to WHERE clauses, to all portions of requests, etc? Keeping the API as-is is always an option, if this ends up being the preference of the community then we definitely need to improve the documentation to note that there can be unintended interactions with other parts of the query engine such as the optimizer when initial bindings are used. > >It was IMHO a mistake to switch to parameterized queries. We make very >heavy use of initial bindings throughout our software stack, SPIN, SWP, >SPARQLMotion. They are a great feature, allowing users to treat SPARQL >like a programming language (think about a function call, where the >parameters are different each time, yet the developer only needs to >write the logic once). What you are talking about sounds much more like cached execution plans in SQL. I understand the analogy of a SPARQL query to a function but SPARQL variables were not intended to be function arguments, that you choose to treat them as such and that initial bindings lets you treat them as such is a perhaps unintended consequence of ARQ's API. > > From what I understand so far, parameterized SPARQL has performance >issues - we would need to re-parse the string while with initial >bindings we can reuse the same (compiled) Query object each time. >Parameterized queries also don't support the bound(?var) operator, and >probably others. Yes there is a performance hit associated with parameterized queries and yes it is possible to turn a syntactically valid query into an invalid one by injecting certain values because of the way the parameterization is done on a purely textual basis. This is why I suggested Options 2 and 3 which are algebraic manipulations which would allow the initial bindings to be injected by transformation of the unoptimized algebra tree. This allows you to only take the parsing hit once, use initial bindings as you do now without the optimizer making incorrect decisions because of information that initial bindings gives that it doesn't have access to. These options require substantially more work to get all the kinks worked out but are undoubtably doable, if people agree this is the way forward then likely this will happen eventually but contributions to make it happen sooner are always welcome. This would give users something much closer to cached execution plans which from what you described is essentially how TQ are using initial bindings. Also note that as a user you always have the option of using the BIND and/or VALUES clauses in your queries and updates which achieves the same ends as the initial bindings API and has the benefit of being semantically much clearer from the POV of both the query writer and query engine. > >We have used initial bindings successfully for many years, and although >there have been occasional bugs (I probably reported one such bug per >year) it is a very essential feature from our point of view. If the >overhead of fixing the optimizer is too big, I would be OK with >switching off this optimizer if initial bindings are present. It worked >fine without this optimizer for many years. I understand that TQ considers the feature essential and I hope we as a community can come to some agreement on this topic that satisfies the whole community. The optimizer is not broken, what the optimizer is doing is entirely correct wrt to the semantics of the SPARQL query it is given and the information it has. As I pointed out in my original reply to you there are other optimizations that have had the same behavior for years. That none of your unit tests previously covered a case which would be affected in this way whereas the new optimization does happen to cover such a case is a lucky coincidence as otherwise neither you/ourselves would have been any the wiser. The example query semantically should always return false, that the initial bindings API allows a user to make it do otherwise is the bug IMO though I think it is clear we disagree on this point. Individual optimizations can always be turned off, for this specific one: ARQ.getContext().set(ARQ.optFilterImplicitJoin, false); If FILTER(?x = ?y) and FILTER(?x = <uri>) are common patterns that TQ and SPIN users make use of in conjunction with initial bindings then you should perhaps turn both this and ARQ.optFilterEquality off. Though if the problem is only in strange corner cases like the example you gave then you may be perfectly fine since the special case causing you problems only occurs when the variables are not used within the inner operator that the FILTER applies over. Switching the whole optimizer off is overkill and will almost certainly harm performance because that will include disabling critical optimizations such as ARQs index join strategy. Rob > >HTH >Holger > > >On 8/3/2013 2:52, Rob Vesse wrote: >> Hi All >> >> Holger's question >>(http://mail-archives.apache.org/mod_mbox/jena-users/201308.mbox/%3c51FB8 >>[email protected]%3e) about a regression in ARQs treatment of >>initial bindings raises an interesting disconnect between the >>interpretation of SPARQL and the Initial Bindings API. >> >> Initial bindings in their current form allows for users to essentially >>change the semantics of a query in a non-intuitive way. Take his >>example query: >> >> ASK { FILTER(?a = ?b) } >> >> Intuitively that query MUST always return false yet with initial >>bindings in the mix the query can be made to return true, at least prior >>to 2.10.2 which introduces a new optimizer which includes special case >>recognition for this. >> >> The problem is that using initial bindings can fundamentally change the >>semantics of queries in non-intuitive ways when I believe the intention >>of the API was merely to allow for improved performance by guiding the >>engine. >> >> To me this suggests that initial bindings as currently implemented is >>fundamentally flawed and I would suggest that we think about >>re-architecting this feature in a future release (not the next release). >> I believe there are probably several ways of doing this: >> >> 1 Remove support for initial bindings on queries entirely (as we >>already did for updates) in favor of using ParameterizedSparqlString >> >> 2 Change initial bindings to be a pre-optimization algebra >>transformation of the query >> >> As we've discussed previously in the context of >>ParameterizedSparqlString there is potential to do the substitution at >>the algebra tree level rather than at the textual level. This allows >>for stronger syntax checking and actually changes the query >>appropriately. The problem with this is that it doesn't work if we want >>to inject multiple values for a variable, hence Option 3 >> >> 3 Change initial bindings to be done by injection of VALUES clauses >> >> This approach is again by algebra transform and would involve inserting >>VALUES clauses at each leaf of the algebra tree. So Holger's query with >>initial bindings applied would be rewritten like so: >> >> ASK >> { >> VALUES ( ?a ?b ) { ( true true ) } >> FILTER (?a = b) >> } >> >> However this approach might get rather complex for larger queries and >>also runs into issues of scope, what if we insert the VALUES clause >>inside of a sub-query which doesn't propagate those initial bindings >>outside of it etc. >> >> 4 Skip optimization when initial bindings are involved >> >> This is the easiest approach but we can't enforce this on other query >>engine implementations and it could seriously harm performance for those >>that use initial bindings extensively. >> >> There may also be other approaches I haven't thought so please suggest >>anything that makes sense. Bottom line is that initial bindings in its >>current form seems fundamentally broken to me and we should be thinking >>of how to fix this in the future. >> >> Rob >> >
