[ 
https://issues.apache.org/jira/browse/RYA-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Meier updated RYA-408:
----------------------------
    Description: 
A number of optimizations were made to the Rya PCJ Updater to support sharding. 
 Among these optimizations was sharding the binding set results to distribute 
the load among the tablet servers.  These changes prevent the JoinResultUpdater 
from creating joins that are the result of direct products.  This is a result 
of how new rows are written in the Fluo table.  For example, statement patterns 
used to be written in the form 
"SP_123/BS_Val1:BS_Val2", but are now written as 
SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
where HASH(BS_Val1) is the hash of the first binding set value.  Before 
sharding, a targeted range scan for all the results associated with SP_123 
could be done, because all of entries associated with that node had that 
prefix.  After sharding, it is impossible to do a targeted range lookup on 
values corresponding to SP_123 without the first binding set value (because the 
hash precedes the id = 123).  So if the JoinResultUpdater attempts to join a 
new StatementPattern result with the results of another StatementPattern and 
there are no common variables (and therefore no first binding value to hash), 
then the updater will not locate any results.  

This issue can be resolved by issuing a more general scan on the "SP" prefix 
and then filtering the results on the StatementPattern nodeId (123 in the above 
example).  This is not a very performant approach, but may be the only way to 
resolve the issue.  Given the large amount of data that is currently stored in 
the Fluo table already, there is some question about whether we should support 
direct products in Fluo queries anyway.  Another approach is to simply attempt 
to optimize queries to avoid direct queries when they are registered (this 
should be done anyway), and if there is no arrangement that avoid direct 
products, then throw an exception.  So we could take the approach that queries 
that have unavoidable direct products should not be allowed to be registered in 
Fluo.   

  was:
A number of optimizations were made to the Rya PCJ Updater to support sharding. 
 Among these optimizations was sharding the binding set results to distribute 
the load among the tablet servers.  The changes that were made to shard the 
rows prevents the JoinResultUpdater from creating joins that are the result of 
direct products.  This is a direct result of how new rows are written in the 
Fluo table.  For example, statement patterns used to be written in the form 
"SP_123/BS_Val1:BS_Val2", but are now written as 
SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
where HASH(BS_Val1) is the hash of the first binding set value.  After 
sharding, it is impossible to do a targeted range lookup on values 
corresponding to SP_123 without the first binding set value (because the hash 
precedes the id).  So if the JoinResultUpdater attempts to join a new 
StatementPattern result with the results of another StatementPattern and there 
are no common variables (and therefore no first binding value to hash), then 
the updater will not locate any results.  

This issue can be resolved by issuing a more general scan on the "SP" prefix 
and then filtering the results on the StatementPattern nodeId.  This is not a 
very performant approach, but may be the only way to resolve the issue.  Given 
the large amount of data that is currently stored in the Fluo table already, 
there is some question about whether we should support direct products in Fluo 
queries anyway.  Another approach is to simply attempt to optimize queries to 
avoid direct queries when they are register (this should be done anyway), and 
if there is no arrangement that avoid direct products, then throw an exception. 
 Queries that have unavoidable direct products should not be allowed to be 
registered in Fluo.   


> PCJ Updater Does Not Support Queries with DIrect Products
> ---------------------------------------------------------
>
>                 Key: RYA-408
>                 URL: https://issues.apache.org/jira/browse/RYA-408
>             Project: Rya
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 3.2.12
>            Reporter: Caleb Meier
>
> A number of optimizations were made to the Rya PCJ Updater to support 
> sharding.  Among these optimizations was sharding the binding set results to 
> distribute the load among the tablet servers.  These changes prevent the 
> JoinResultUpdater from creating joins that are the result of direct products. 
>  This is a result of how new rows are written in the Fluo table.  For 
> example, statement patterns used to be written in the form 
> "SP_123/BS_Val1:BS_Val2", but are now written as 
> SP:HASH(BS_Val1):123/BS_Val1:BS_Val2",
> where HASH(BS_Val1) is the hash of the first binding set value.  Before 
> sharding, a targeted range scan for all the results associated with SP_123 
> could be done, because all of entries associated with that node had that 
> prefix.  After sharding, it is impossible to do a targeted range lookup on 
> values corresponding to SP_123 without the first binding set value (because 
> the hash precedes the id = 123).  So if the JoinResultUpdater attempts to 
> join a new StatementPattern result with the results of another 
> StatementPattern and there are no common variables (and therefore no first 
> binding value to hash), then the updater will not locate any results.  
> This issue can be resolved by issuing a more general scan on the "SP" prefix 
> and then filtering the results on the StatementPattern nodeId (123 in the 
> above example).  This is not a very performant approach, but may be the only 
> way to resolve the issue.  Given the large amount of data that is currently 
> stored in the Fluo table already, there is some question about whether we 
> should support direct products in Fluo queries anyway.  Another approach is 
> to simply attempt to optimize queries to avoid direct queries when they are 
> registered (this should be done anyway), and if there is no arrangement that 
> avoid direct products, then throw an exception.  So we could take the 
> approach that queries that have unavoidable direct products should not be 
> allowed to be registered in Fluo.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to