[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141740#comment-16141740 ] Harshvardhan Gupta commented on DERBY-6942: --- Bryan, I am summarizing the remaining work required on releasing the changes done as part of Derby-6940 and Derby-6942. Test cases are required for addition of new statistics and the integration of integer data type with the additional new stats. Also the patch doesn't contain documentation changes that need to be done for each of the above new functionalities. Also, what do you think of integrating additional data types such as varchar etc and releasing all the changes together or doing it iteratively over time? > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6949) Analysis of Derby Cost Estimation
[ https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113150#comment-16113150 ] Harshvardhan Gupta edited comment on DERBY-6949 at 8/3/17 5:36 PM: --- By Cost we generally refer to any number proportional to the execution time, in Derby Cardinality estimates are also clubbed in the bigger cost structure. Cost modelling for Join Strategies generally 1) Nested Loop - After finding out the cost of a single probe in the inner resultset, nested loop multiplies the cost with the number of rows in outer resultset. The cost of these resultsets come from cost modelling by Optimizables which also serve as entities in cost estimation. For Example - Two base tables joined via a loop join serve as Optimizables where the outer table reports its scanning cost and the inner table reports the cost of lookup of a single row matching the outer row. Note that even though the cost is multiplied, the cardinality estimates are not simply multiplied for the two base tables (or intermediate results) 2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each outer row in case of hash joins and the cost for hash joins are only dependent on the base tables / intermediate results (i.e called optimizables). Thus at the join level no further cost changes are made. was (Author: harshvardhan145): By Cost we generally refer to any number proportional to the execution time, in Derby Cardinality estimates are also clubbed in the bigger cost structure. Cost modelling for Join Strategies generally 1) Nested Loop - After finding out the cost of a single probe in the inner resultset, nested loop multiplies the cost with the number of rows in outer resultset. The cost of these resultsets come from cost modelling by Optimizables which also serve as entities in cost estimation. For Example - Two base tables joined via a loop join serve as Optimizables where the outer table reports its scanning cost and the inner table reports the cost of lookup of a single row matching the outer row. Note that even though the cost is multiplied, the cardinality estimates are not simply multiplied for the two base tables. 2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each outer row in case of hash joins and the cost for hash joins are only dependent on the base tables / intermediate results (i.e called optimizables). Thus at the join level no further cost changes are made. > Analysis of Derby Cost Estimation > - > > Key: DERBY-6949 > URL: https://issues.apache.org/jira/browse/DERBY-6949 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation
[ https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113150#comment-16113150 ] Harshvardhan Gupta commented on DERBY-6949: --- By Cost we generally refer to any number proportional to the execution time, in Derby Cardinality estimates are also clubbed in the bigger cost structure. Cost modelling for Join Strategies generally 1) Nested Loop - After finding out the cost of a single probe in the inner resultset, nested loop multiplies the cost with the number of rows in outer resultset. The cost of these resultsets come from cost modelling by Optimizables which also serve as entities in cost estimation. For Example - Two base tables joined via a loop join serve as Optimizables where the outer table reports its scanning cost and the inner table reports the cost of lookup of a single row matching the outer row. Note that even though the cost is multiplied, the cardinality estimates are not simply multiplied for the two base tables. 2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each outer row in case of hash joins and the cost for hash joins are only dependent on the base tables / intermediate results (i.e called optimizables). Thus at the join level no further cost changes are made. > Analysis of Derby Cost Estimation > - > > Key: DERBY-6949 > URL: https://issues.apache.org/jira/browse/DERBY-6949 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation
[ https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113109#comment-16113109 ] Harshvardhan Gupta commented on DERBY-6949: --- The whole lifecycle of Query Optimization in Derby is as follows - Optimizer Interface's implementation OptimizerImpl.java is responsible to iterate through all permitted semantically equivalent execution plans and orchestrating calls to the various entities that provide their 'estimated cost' of execution considering the set of constraints that will be associated with them during that particular plan. Before getting into the details of the above mentioned entities and their estimated cost, it is important to note that the search space of all the possible semantically equivalent plans is limited to only left deep trees. The selection of join order also influences the way we want to constraint the execution. For example - A left deep HJ has to wait for each join to finished completely so that the result set can be hashed before the next step can probe it, on the other hand in a right deep HJ, it is not the intermediate results that are being hashed but the base tables. So, Right deep HJ trees use substantially more memory / disk spillovers. It has been shown in the [VLDB Optimizer Paper|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] (Section 6.2) that left deep trees perform quite reasonably while restricting search space and making the Optimizer simple and fast as opposed to right deep trees where disk spills eat away a lot of performance. The author further emphasize on cardinality estimates as opposed to search space which may not offer significant performance benefits. This area definitely needs to be further explored in Derby's context. Coming back to the entities for cost estimation, they are Join Strategy, Optimizables and Ordering. Each of these implement the 'estimateCost' function and gives out their estimated cost of execution to the Optimizer. Optimizer permutes through the various combinations and produces the best possible execution plan it is able to find. > Analysis of Derby Cost Estimation > - > > Key: DERBY-6949 > URL: https://issues.apache.org/jira/browse/DERBY-6949 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DERBY-6954) Allow plans to execute where HASH tables are estimated to spill to disk.
[ https://issues.apache.org/jira/browse/DERBY-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6954: - Assignee: Harshvardhan Gupta > Allow plans to execute where HASH tables are estimated to spill to disk. > > > Key: DERBY-6954 > URL: https://issues.apache.org/jira/browse/DERBY-6954 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > > Derby Optimizer currently rejects the plans where it estimates that > intermediate HASH Tables in HASH Joins will spill to disk. We need to model > the cost associated of HASH Join disk spill in order to make them a candidate > for optimal plan. DERBY-6938, DERBY-1259 and DERBY-106 contain some useful > details around this behaviour. > One possible workaround around this behaviour is to explicitly specify > optimizer hints and overriding the default optimizer behaviour. > https://db.apache.org/derby/docs/10.13/tuning/ctunoptimzoverride.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DERBY-6954) Allow plans to execute where HASH tables are estimated to spill to disk.
Harshvardhan Gupta created DERBY-6954: - Summary: Allow plans to execute where HASH tables are estimated to spill to disk. Key: DERBY-6954 URL: https://issues.apache.org/jira/browse/DERBY-6954 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Derby Optimizer currently rejects the plans where it estimates that intermediate HASH Tables in HASH Joins will spill to disk. We need to model the cost associated of HASH Join disk spill in order to make them a candidate for optimal plan. DERBY-6938, DERBY-1259 and DERBY-106 contain some useful details around this behaviour. One possible workaround around this behaviour is to explicitly specify optimizer hints and overriding the default optimizer behaviour. https://db.apache.org/derby/docs/10.13/tuning/ctunoptimzoverride.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110605#comment-16110605 ] Harshvardhan Gupta commented on DERBY-6938: --- Bryan, I agree with your observations. For production of efficient query plans, Cardinality Estimates, Cost estimation and Plan space are all inter dependent on each other. I am filing another issue under DERBY-6949 to track the effort on modelling HASH join's spill behaviour. I will also update cost modelling details under DERBY-6949 shortly after I had to take a detour analyzing HASH Join's behaviour. Thanks, Vardhan > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt, traceout.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097272#comment-16097272 ] Harshvardhan Gupta commented on DERBY-6938: --- Summarizing my analysis of HASH based joins so far - Derby has the capability to perform HASH Joins spilling to disk but has long ignored them due to the absence of a cost model around such a behaviour. DERBY-1259 also talks about the history of HASH Joins and the changes made as part of DERBY-106. > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt, traceout.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6938: -- Attachment: traceout.txt Following my previous comment I am also attaching the sample optimizer trace of Query 1a of job dataset where Optimizer ignores the HASH Tables which it predicts will spill to disk. trace output String - "Skipping access path due to excess memory usage, maximum is 1048576". > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt, traceout.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097260#comment-16097260 ] Harshvardhan Gupta commented on DERBY-6938: --- Hi Bryan, The Optimizer produces memory and cost estimates for all the possible access paths that it can build. Derby will only consider those HASH Joins to go through which it _predicts _ will fit into memory and rejects all others (including those which will be very efficient for some queries as opposed to loop based joins). However, the memory predictions of Derby may not match memory requirement at execution and HASH table may become larger than what Derby predicted. DERBY-106 was an effort to mitigate the OOM exceptions arising out of this behaviour. So although HASH tables do spill to disk but only for those access path go forward to execution where Derby predicts that the HASH Tables will not spill to disk. The above observations of mine match with the comment on DERBY-1259 which was written a decade ago. Coming to the resolution of this problem, we need to investigate how to measure the cost of execution when Derby will eventually allow those HASH based access paths that it predicts will spill to disk instead of simply ignoring them. Also, interesting is the observation that Derby currently allows hash based joins when users specify their join strategies via Query Hints. > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097260#comment-16097260 ] Harshvardhan Gupta edited comment on DERBY-6938 at 7/22/17 11:41 AM: - Hi Bryan, The Optimizer produces memory and cost estimates for all the possible access paths that it can build. Derby will only consider those HASH Joins to go through which it _predicts_ will fit into memory and rejects all others (including those which will be very efficient for some queries as opposed to loop based joins). However, the memory predictions of Derby may not match memory requirement at execution and HASH table may become larger than what Derby predicted. DERBY-106 was an effort to mitigate the OOM exceptions arising out of this behaviour. So although HASH tables do spill to disk but only for those access path go forward to execution where Derby predicts that the HASH Tables will not spill to disk. The above observations of mine match with the comment on DERBY-1259 which was written a decade ago. Coming to the resolution of this problem, we need to investigate how to measure the cost of execution when Derby will eventually allow those HASH based access paths that it predicts will spill to disk instead of simply ignoring them. Also, interesting is the observation that Derby currently allows hash based joins when users specify their join strategies via Query Hints. was (Author: harshvardhan145): Hi Bryan, The Optimizer produces memory and cost estimates for all the possible access paths that it can build. Derby will only consider those HASH Joins to go through which it _predicts _ will fit into memory and rejects all others (including those which will be very efficient for some queries as opposed to loop based joins). However, the memory predictions of Derby may not match memory requirement at execution and HASH table may become larger than what Derby predicted. DERBY-106 was an effort to mitigate the OOM exceptions arising out of this behaviour. So although HASH tables do spill to disk but only for those access path go forward to execution where Derby predicts that the HASH Tables will not spill to disk. The above observations of mine match with the comment on DERBY-1259 which was written a decade ago. Coming to the resolution of this problem, we need to investigate how to measure the cost of execution when Derby will eventually allow those HASH based access paths that it predicts will spill to disk instead of simply ignoring them. Also, interesting is the observation that Derby currently allows hash based joins when users specify their join strategies via Query Hints. > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082789#comment-16082789 ] Harshvardhan Gupta edited comment on DERBY-6942 at 7/11/17 7:16 PM: During the local tests I did, I tested the selectivity outputs for relational operators of the following types : (>=, <=, > , <) and integer data type with the latest patch. Regarding the examples Bryan mentioned, I am not quite sure about this but I believe 'between' operator is internally transformed into one of these relational ops, in that case the latest patch is good for 'between' op also. UPDATE: between transformation seem to work with the patch and the selectivity is the product of individual selectivities of LESS_THAN and GREATER_THAN operator types. was (Author: harshvardhan145): During the local tests I did, I tested the selectivity outputs for relational operators of the following types : (>=, <=, > , <) and integer data type with the latest patch. Regarding the examples Bryan mentioned, I am not quite sure about this but I believe 'between' operator is internally transformed into one of these relational ops, in that case the latest patch is good for 'between' op also. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082789#comment-16082789 ] Harshvardhan Gupta commented on DERBY-6942: --- During the local tests I did, I tested the selectivity outputs for relational operators of the following types : (>=, <=, > , <) and integer data type with the latest patch. Regarding the examples Bryan mentioned, I am not quite sure about this but I believe 'between' operator is internally transformed into one of these relational ops, in that case the latest patch is good for 'between' op also. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6942: -- Attachment: DERBY-6942_2.diff Hi Bryan, Attaching the new code based on your suggestions. Currently, estimateProportion is implemented for SQLINTEGER, other data types' selectivity will be calculated using existing hardwired values unless we chose to override the estimateProportion function for each of them. Vardhan > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079256#comment-16079256 ] Harshvardhan Gupta commented on DERBY-6942: --- I experimented with integrating max and min values similar to null values. Unlike a straightforward null count, the primary challenge associated with this problem is that we are going to need to define a distance metric with each of the sql types in derby. For each example we need a procedure associate with varchar datatype which will tell derby that a string starting with 'L' is approximately equally far away from string startings with 'A' and 'Z'.(and thus give away a selectivity of 50% instead of relying on hardwired estimates) For some data types this is going to be quite straightforward like for int, long etc. Nevertheless, extensive testing is going to be required for each of the data type which is integrated in the above process. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation
[ https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078310#comment-16078310 ] Harshvardhan Gupta commented on DERBY-6949: --- As part of my work on GSoC 2017, this issue will track work on the analysis and proposed improvements to cost analysis in Derby Query Optimizer. > Analysis of Derby Cost Estimation > - > > Key: DERBY-6949 > URL: https://issues.apache.org/jira/browse/DERBY-6949 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DERBY-6949) Analysis of Derby Cost Estimation
Harshvardhan Gupta created DERBY-6949: - Summary: Analysis of Derby Cost Estimation Key: DERBY-6949 URL: https://issues.apache.org/jira/browse/DERBY-6949 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Assignee: Harshvardhan Gupta -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071413#comment-16071413 ] Harshvardhan Gupta commented on DERBY-6942: --- Yes, the cardinality and cost estimates are both going to be more accurate, typically in case of loop joins cost is estimated as (number of rows in outer tables * cost for single row match in inner table). > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071405#comment-16071405 ] Harshvardhan Gupta commented on DERBY-6942: --- Let me emphasize on this improvement in case of multiple joins: Although in the above example my changes didn't change the cost (it shouldn't since during execution by that conglomerate needs to do the same work), improvement in cardinality estimates will help us avoid cascading errors during cost estimation of joins. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071404#comment-16071404 ] Harshvardhan Gupta commented on DERBY-6942: --- Bryan, You are on point! > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071383#comment-16071383 ] Harshvardhan Gupta edited comment on DERBY-6942 at 7/1/17 8:02 PM: --- I was able to solve my problem after spending some more time on it. Attached are two files, one is the patch which uses additional statistics for null count values to generate accurate selectivity estimates. It build upon DERBY-6940 and makes changes to FromBaseTable.java and TableDescriptor.java to consume the additional statistics. The other file is the optimizer trace after application of the patch. The extra qualifier selectivity is changed to 0.5 representing the fraction of null values in my dataset during last statistics update. I also tested with multi column indexes and I was able to successfully obtain accurate selectivity estimates for queries of this class. I encourage you to download the patch and test the behaviour. Some more work is required to integrate minimum and maximum column values which were also collected as part of additional statistics. was (Author: harshvardhan145): I was able to solve my problem after spending some more time on it. Attached are two files, one is the patch which uses additional statistics for null count values to generate accurate selectivity estimates. It build upon DERBY-6940 and makes changes to FromBaseTable.java and TableDescriptor.java to consume the additional statistics. The other file is the optimizer trace after application of the patch. The extra qualifier selectivity is changed to 0.5 representing the fraction of null values on my dataset during last statistics update. I also tested with multi column indexes and I was able to successfully obtain accurate selectivity estimates for queries if this class. I encourage you to download the patch and test the behaviour. Some more work is required to integrate minimum and maximum column values which were also collected as part of additional statistics. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6942: -- Attachment: DERBY-6942.diff z13.txt I was able to solve my problem after spending some more time on it. Attached are two files, one is the patch which uses additional statistics for null count values to generate accurate selectivity estimates. It build upon DERBY-6940 and makes changes to FromBaseTable.java and TableDescriptor.java to consume the additional statistics. The other file is the optimizer trace after application of the patch. The extra qualifier selectivity is changed to 0.5 representing the fraction of null values on my dataset during last statistics update. I also tested with multi column indexes and I was able to successfully obtain accurate selectivity estimates for queries if this class. I encourage you to download the patch and test the behaviour. Some more work is required to integrate minimum and maximum column values which were also collected as part of additional statistics. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6942.diff, z12.txt, z13.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071337#comment-16071337 ] Harshvardhan Gupta commented on DERBY-6942: --- In code, I am trying to query the statistics of those column which are backed by indexes. (and thus their statistics exist). I'm struck on this problem: How do we take a predicate and find out the relevant conglomerate from the list of all conglomerates associated with that table. From a predicate we can find out the ColumnReference which holds the Column Name, so I am trying to use this info to find out the relevant info. I would appreciate help on this blocker. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: z12.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6942: -- Attachment: z12.txt Please find attached the optimizer trace of a specimen query. Table DDL - create table abcd(id integer); create index idx on abcd(id); Query issued - select * from ABCD where id is NULL As the optimizer trace shows - There are two possible access paths - one via the table scan and one via the conglomerate backing the index. Each of the access paths can be queried via two lookup (or join strategies in case of joins), in our case a loop or a hash lookup. During the index scan, the predicate (ID is NULL) acts as a valid start and stop predicate and store makes accurate representation regarding cardinality estimates from index lookup. In case of table scan, the predicate acts as a qualifier (a predicate that will restrict or filter the result set obtained via scan but won't help to locate the position of starting or ending point as opposed to index lookups). As evident in the trace output, a hard wired selectivity of 0.1 is being used here in this case of table scan for cardinality estimate. We have a opportunity to use statistics in such cases to remove our dependancy on hardwired estimates. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: z12.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067124#comment-16067124 ] Harshvardhan Gupta commented on DERBY-6942: --- In particular, when estimating cost for a particular conglomerate, it would be good if we start looking at other conglomerates' statistics for cardinality estimates. For example during table scan, we can start adjusting cardinality estimates using the statistics for index. Note that cost and cardinality are two different concepts, cost will always be determined by the number and size of pages / rows that a particular access path has to read to execute its work. Table scan's cost will remain unaffected here in this case by changing thecardinality estimates. (However, index scan's cost and cardinality estimates will be co-related in our particular example) > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067117#comment-16067117 ] Harshvardhan Gupta commented on DERBY-6942: --- The example I discussed above was quite a simple example, when there are multiple joins, the cardinality estimates become important. Since the result of a particular table node act as outer table for the nested join, errors will propagate outwards and small errors will become significant. (cost of scans are determined by multiplication from the total number of rows of outer table if table locks are being used) > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067113#comment-16067113 ] Harshvardhan Gupta commented on DERBY-6942: --- The estimateCost function in FromBaseTable.java is the guts of the cost and cardinality estimation logic for base table predicates. The function is called for each of conglomerate (i.e each of the access path possible on the table such as indexes, full table scan). For predicates that can be utilized as start or stop predicates for the given conglomerate (refer http://db.apache.org/derby/docs/10.13/tuning/ctunoptimz24840.html), row estimates are obtained through the store by giving the actual constant keys specified in the query, the estimate by row is quite accurate as it finds out the fraction of values between the start and stop keys. For all other remaining predicates one of the two things happen: 1) If an equality op is specified and statistics exist for the conglomerate (and it is not a valid start/stop predicate), we query statistics for selectivity.(it takes into account number of unique columns) 2) Hard wired selectivities are used for all other cases. As part of DERBY-6940, we started collecting extra statistics such as null count, min and max value for each of the columns of the index. We would like to utilise them to eliminate usage of hard wired selectivities whenever possible. For example - Let us say we have a table and an associated index created using the following statement - create table abcd(id integer); create index idx on abcd(id); Now, suppose a query is issued - select * from abcd where id is NULL; There are two possible access paths for the query and both are considered by the query optimizer - 1) Index Scan - The store is able to give away fairly accurate row estimates as 'NULL' is a valid start as well as stop key. 2) Table scan - Here, hard wired selectivity estimates are used for row estimates as discussed above. > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063622#comment-16063622 ] Harshvardhan Gupta commented on DERBY-6940: --- Since I am performing deep copy of the objects, I don't think (min==max) will work in our case. So I had to use the equality function that is implemented by each data type, the problem is that I need to catch the StandardException thrown by those functions and I can't delegate it to the writeExternal method since the interface definition won't let it compile. I too believe that the code in present form is little clunky and open for potential improvements. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DERBY-6942) Utilise additional statistics for selectivity estimates.
Harshvardhan Gupta created DERBY-6942: - Summary: Utilise additional statistics for selectivity estimates. Key: DERBY-6942 URL: https://issues.apache.org/jira/browse/DERBY-6942 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6942: - Assignee: Harshvardhan Gupta > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061215#comment-16061215 ] Harshvardhan Gupta commented on DERBY-6940: --- Also, I am creating another issue which will track the progress of integration of the new statistics into selectivity estimates. Meanwhile, I would like to invite your thoughts on other statistics we should consider. Most common values and distribution buckets are on the top of my mind right now, I'd appreciate your thoughts on the same. The average size of row written to disk is important for cost estimation and a quick examination of derby optimizer's code reveals that derby is already using it in some form, nevertheless I would revisit it during my analysis of cost estimation over the course of this project. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061206#comment-16061206 ] Harshvardhan Gupta commented on DERBY-6940: --- Hi Bryan, I thought of a workaround and was successful. In particular, I am comparing the maxVal and minVal and if both are equal I first write an indicator boolean and then write only one DataValueDescriptor object. In all other cases, I first write maxVal and then minVal, In this way the problematic object will always be written last once. public void writeExternal(ObjectOutput out) throws IOException { FormatableHashtable fh = new FormatableHashtable(); fh.putLong("numRows", numRows); fh.putLong("numUnique", numUnique); fh.putLong("nullCount", nullCount); out.writeObject(fh); try{ if (maxVal.equals(maxVal, minVal).getBoolean()) { out.writeBoolean(true); out.writeObject(minVal); return; } } catch(StandardException e){ } finally { out.writeBoolean(false); out.writeObject(maxVal); out.writeObject(minVal); } } public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { FormatableHashtable fh = (FormatableHashtable)in.readObject(); numRows = fh.getLong("numRows"); numUnique = fh.getLong("numUnique"); nullCount = fh.getLong("nullCount"); if(in.readBoolean()){ maxVal = (DataValueDescriptor)in.readObject(); minVal = maxVal.cloneValue(true); } else{ maxVal = (DataValueDescriptor) in.readObject(); minVal = (DataValueDescriptor) in.readObject(); } } > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060091#comment-16060091 ] Harshvardhan Gupta commented on DERBY-6940: --- While debugging the failed test cases I seem to have reproduced a previously know issue. Please take a look at my comment on DERBY-3219. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-3219) Group by query with many aggregate columns and case statements fails with: ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted to read more da
[ https://issues.apache.org/jira/browse/DERBY-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060086#comment-16060086 ] Harshvardhan Gupta commented on DERBY-3219: --- While examining the failed test cases as part of patch for DERBY-6940,I get this error because of exactly the same root cause. i.e value I am trying to store is a SQLVarChar whose value is "" (that is, a string of length 0). ] To reproduce the error with the patch attached with DERBY-6940, run ant -Dderby.junit.testclass=org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance junit-single Since I am trying to filter the min and max values from the database, a string of length 0 turns out to be min value for the test cases in OrderByAndSortAvoidance. > Group by query with many aggregate columns and case statements fails with: > ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted > to read more data than was originally stored > - > > Key: DERBY-3219 > URL: https://issues.apache.org/jira/browse/DERBY-3219 > Project: Derby > Issue Type: Bug > Components: SQL >Affects Versions: 10.3.1.4 >Reporter: Stan Bradbury >Assignee: Bryan Pendleton > Fix For: 10.5.1.1 > > Attachments: maxminPatch.diff, patchWithTest.diff, pivotView.zip, > repro.java, sanityTest.diff, ten_four_patch.diff, testWithMemControls.diff > > > using the attached database (v10.3) - " select * from pivotview " fails with > the stack trace below. A view (pivotview_ok) created on a subset of the > columns in pivotview executes fine. Adding one column back into pivotview_ok > causes failures most of the time. See attached for view definitions. > 2007-11-21 00:58:49.421 GMT Thread[main,5,main] (XID = 2734422), (SESSIONID = > 0), (DATABASE = pivotview), (DRDAID = null), Failed Statement is: select * > from pivotview > ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted > to read more data than was originally stored > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.StreamFileContainer.fetchNext(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.StreamFileContainerHandle.fetchNext(Unknown > Source) > at org.apache.derby.impl.store.access.sort.MergeScan.mergeARow(Unknown > Source) > at org.apache.derby.impl.store.access.sort.MergeScan.init(Unknown > Source) > at > org.apache.derby.impl.store.access.sort.MergeSort.openSortScan(Unknown Source) > at > org.apache.derby.impl.store.access.RAMTransaction.openSortScan(Unknown Source) > at > org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(Unknown > Source) > at > org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(Unknown > Source) > at > org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(Unknown > Source) > at > org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source) > at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at org.apache.derby.impl.tools.ij.ij.executeImmediate(Unknown Source) > at org.apache.derby.impl.tools.ij.utilMain.doCatch(Unknown Source) > at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(Unknown Source) > at org.apache.derby.impl.tools.ij.utilMain.go(Unknown Source) > at org.apache.derby.impl.tools.ij.Main.go(Unknown Source) > at org.apache.derby.impl.tools.ij.Main.mainCore(Unknown Source) > at org.apache.derby.impl.tools.ij.Main14.main(Unknown Source) > at org.apache.derby.tools.ij.main(Unknown Source) > Caused by: java.io.EOFException > at java.io.DataInputStream.readBoolean(DataInputStream.java:248) > at > org.apache.derby.impl.sql.execute.MaxMinAggregator.readExternal(Unknown > Source) > at > org.apache.derby.iapi.services.io.FormatIdInputStream.readObject(Unknown > Source) > at org.apache.derby.iapi.types.UserType.readExternal(Unknown Source) > ... 22 more > = begin nested exception, level (1) === > java.io.EOFException > at java.io.DataInputStream.readBoolean(DataInputStream.java:248) > at > org.apache.derby.impl.sql.execute.MaxMinAggregator.readExternal(Unknown > Source) > at >
[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6940: -- Description: Derby should collect extra statistics during index build time, statistics refresh time which will help optimizer make more precise selectivity estimates and chose better execution paths. We eventually want to utilize the new statistics to make better selectivity estimates / cost estimates that will help find the best query plan. Currently Derby keeps two type of stats - the total row count and the number of unique values. We are initially extending the stats to include null count, the minimum value and maximum value associated with each of the columns of an index. This would be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, >= ] , all of which currently rely on hardwired selectivity estimates. was:Derby should collect extra statistics during index build time, statistics refresh time which will help optimizer make more precise selectivity estimates and chose better execution paths. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058041#comment-16058041 ] Harshvardhan Gupta commented on DERBY-6940: --- Another related link I missed earlier - https://www.postgresql.org/docs/9.1/static/planner-stats-details.html > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6940: -- Attachment: DERBY-6940_3.diff Bryan, Thanks for the help. I was able to solve my problem by following your direction. Attached is the new patch. Regarding the derby statistics, postgres statistics information can be found out at - https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Using_ANALYZE_to_optimize_PostgreSQL_queries https://www.postgresql.org/docs/9.0/static/catalog-pg-statistic.html Postgres uses null count and distribution buckets, we are currently trying to do a primitive version of the distribution buckets assuming uniform distribution between our min values and max values but we can certainly incorporate that behaviour. Another interesting thing is the average size of the row written to disk. This stat could be helpful in cost estimation, I don't see it's relevance in cardinality estimation though. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056086#comment-16056086 ] Harshvardhan Gupta commented on DERBY-6940: --- I am looking at object serialization as a possible solution but would like to be sure whether it makes sense to store the entire object or just its string representation. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056079#comment-16056079 ] Harshvardhan Gupta commented on DERBY-6940: --- Another thing, I am able to write and read the minVal and maxVal as strings successfully using their toString methods, but when trying to store the original DataValueDescriptor object, I get the following error while trying to read the statistics - ERROR XSDA8: Exception during restore of a serializable or SQLData object of cla ss ERROR XJ001: Java exception: 'java.lang.ClassCastException: java.lang.String can not be cast to org.apache.derby.iapi.types.DataValueDescriptor: java.io.StreamCo rruptedException'. ERROR XJ001: Java exception: 'java.lang.String cannot be cast to org.apache.derb y.iapi.types.DataValueDescriptor: java.lang.ClassCastException'. Storing minVal and maxVal using String is not preferred for comparisons later on, I am trying to store the original object. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6940: -- Attachment: DERBY-6940_2.diff I am attaching an updated diff which now collects extra statistics whenever an index is created / altered. There are 3 ways for statistics update - 1) Creating a new index. 2) Altering an Index. 3) Explicitly calling SYSCS_UPDATE_STATISTICS system procedure. The current patch collects statistics in all the above cases. I am currently working on upgrade logic post which we can look at ways to utilise the extra statistics in selectivity estimates. While a soft upgrade can be performed by turning off extra statistics collection and turning off reading and writing to store. However, we can go about multiple ways regarding the hard upgrade strategy. Options - 1) We update all the existing indexes and add extra statistics during upgrade time. 2) We wait for user to explicitly call UPDATE_STATISTICS / Alter index system procedure before collecting and writing to store extra statistics. I would like to know what the community thinks is the best way, also please share if you have any other strategy in mind. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054502#comment-16054502 ] Harshvardhan Gupta commented on DERBY-6905: --- I can see that dblook_test and others have not been added to the tools package's _Suite.java triggered by junit-all. I was under the impression that all tests were added, what is the alternative way to trigger all the available junit tests for future reference? > Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column > --- > > Key: DERBY-6905 > URL: https://issues.apache.org/jira/browse/DERBY-6905 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: dblook_tests.diff > > > DERBY-6542 changed IDENTITY columns to build upon the underlying > SEQUENCE objects. SEQUENCE objects can have a CYCLE option > set for them, and as of DERBY-6852, the CYCLE option controls the > behavior of the IDENTITY column that uses that SEQUENCE. > It would be nice if SYS.SYSCOLUMNS would report the CYCLE option > that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054256#comment-16054256 ] Harshvardhan Gupta commented on DERBY-6905: --- Sorry I meant the junit-all target. > Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column > --- > > Key: DERBY-6905 > URL: https://issues.apache.org/jira/browse/DERBY-6905 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: dblook_tests.diff > > > DERBY-6542 changed IDENTITY columns to build upon the underlying > SEQUENCE objects. SEQUENCE objects can have a CYCLE option > set for them, and as of DERBY-6852, the CYCLE option controls the > behavior of the IDENTITY column that uses that SEQUENCE. > It would be nice if SYS.SYSCOLUMNS would report the CYCLE option > that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054252#comment-16054252 ] Harshvardhan Gupta commented on DERBY-6905: --- Hi Bryan, The patch looks good to me but I am trying to understand why the ant all target passes without this patch. Vardhan > Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column > --- > > Key: DERBY-6905 > URL: https://issues.apache.org/jira/browse/DERBY-6905 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: dblook_tests.diff > > > DERBY-6542 changed IDENTITY columns to build upon the underlying > SEQUENCE objects. SEQUENCE objects can have a CYCLE option > set for them, and as of DERBY-6852, the CYCLE option controls the > behavior of the IDENTITY column that uses that SEQUENCE. > It would be nice if SYS.SYSCOLUMNS would report the CYCLE option > that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053198#comment-16053198 ] Harshvardhan Gupta edited comment on DERBY-6940 at 6/18/17 1:56 PM: Attaching an initial patch which add the following statistics information - 1) The minimum value of each column appearing in an index. 2) The maximum value of each column appearing in an index. 3) The null counts for each column appearing in an index. I am currently storing column level values for minValue and maxValue as DataValueDescriptor object. I am not entirely sure if this is the best way to store it for later use in comparison for determining selectivity estimates. To invoke the procedure extraStatistics call SYSCS_UTIL.SYSCS_UPDATE_STATISTICS(). Currently the extraStatistics procedure is not invoked by the create Index statement, we will need to write a variant for "Create Index" statements to collect the extra statistics during index creation time as well. was (Author: harshvardhan145): Attaching an initial patch which add the following statistics information - 1) The minimum value of each column appearing in an index. 2) The maximum value of each column appearing in an index. 3) The null counts for each column appearing in an index. I am currently storing column level values as DataValueDescriptor object. I am not entirely sure if this is the best way to store it for later use in comparison for determining selectivity estimates. To invoke the procedure extraStatistics call SYSCS_UTIL.SYSCS_UPDATE_STATISTICS(). Currently the extraStatistics procedure is not invoked by the create Index statement, we will need to write a variant for "Create Index" statements to collect the extra statistics during index creation time as well. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053201#comment-16053201 ] Harshvardhan Gupta edited comment on DERBY-6940 at 6/18/17 1:53 PM: Additionally we need to work on upgrade procedures, while trying to access statistics info of an index created before this patch, Derby will throw a NPE due to absence of the new statistics in those indexes. was (Author: harshvardhan145): Additionally we need to work on upgrade procedures, while trying to access statistics info of an index created before this patch, Derby will throw a NPE due to absence of the new statistics. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053201#comment-16053201 ] Harshvardhan Gupta commented on DERBY-6940: --- Additionally we need to work on upgrade procedures, while trying to access statistics info of an index created before this patch, Derby will throw a NPE due to absence of the new statistics. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6940: -- Attachment: derby-6940.diff Attaching an initial patch which add the following statistics information - 1) The minimum value of each column appearing in an index. 2) The maximum value of each column appearing in an index. 3) The null counts for each column appearing in an index. I am currently storing column level values as DataValueDescriptor object. I am not entirely sure if this is the best way to store it for later use in comparison for determining selectivity estimates. To invoke the procedure extraStatistics call SYSCS_UTIL.SYSCS_UPDATE_STATISTICS(). Currently the extraStatistics procedure is not invoked by the create Index statement, we will need to write a variant for "Create Index" statements to collect the extra statistics during index creation time as well. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby-6940.diff > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6940: -- Description: Derby should collect extra statistics during index build time, statistics refresh time which will help optimizer make more precise selectivity estimates and chose better execution paths. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053197#comment-16053197 ] Harshvardhan Gupta commented on DERBY-6938: --- Tracking work on statistics enhancement at https://issues.apache.org/jira/browse/DERBY-6940 > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
Harshvardhan Gupta created DERBY-6940: - Summary: Enhance derby statistics for more accurate selectivity estimates. Key: DERBY-6940 URL: https://issues.apache.org/jira/browse/DERBY-6940 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6940: - Assignee: Harshvardhan Gupta > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048798#comment-16048798 ] Harshvardhan Gupta commented on DERBY-6938: --- The specific approach I am thinking is to keep the minimum and maximum value of columns and number of NULL values in statistics, this could be utilised in operators such as (< , > , <=, >=, IS NOT NULL, NULL) etc. For example, lets say we have a int column and the minimum and maximum value is 20 and 100 respectively. Then for a query predicate on that column with the condition that >=80 should ideally return 25% of all columns. This approach obviously assumes an uniform distribution but should be good to get started with. We should be able to make it more efficient by taking into account distribution later on. > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048793#comment-16048793 ] Harshvardhan Gupta commented on DERBY-6938: --- Bryan, Regarding my doubt earlier, one thing that was particularly useful to dive deep into the optimizer was to enable optimizer tracing. https://wiki.apache.org/db-derby/OptimizerTracing The trace output is quite verbose and helps to understand the various choices the optimizer is making. Few observations and scope of improvements that I would like to point out - 1) Derby falls back to nested loops more often that we would like to particularly in case of large tables, currently the hash table resides entirely in memory and derby rules out the HASHJOIN approach if it suspects that it is going to be too large (default is 1048576) Nested loops do not seem to be a good option specially when joining relatively large tables (similar to imdb dataset we are using) across more than 4 joins. It is also documented in the optimizer paper that creating hash tables that spill to disk is a potential improvement and my experiments confirm that. 2) Another potential improvement with regards to cardinality estimates. Derby currently uses hard wired numbers for every operator other than the equality op for selectivity. https://db.apache.org/derby/docs/10.0/manuals/tuning/perf56.html In case of equality operator with a known value at compile time, it utilises statistics and make selectivity assumptions using number of unique values. I think we can enhance the statistics to be able to make better cardinality estimates. > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048780#comment-16048780 ] Harshvardhan Gupta commented on DERBY-6938: --- To view and compare the estimates row counts and true row counts for base tables and for intermediate results the following queries can be used on xplain tables, the value of OP_IDENTIFIER can be changed to get data for nodes for a particular operation such as HASHJOIN, NLJOIN etc select SEEN_ROWS, SEEN_ROWS_RIGHT, RETURNED_ROWS, EST_ROW_COUNT from SYSXPLAIN_RESULTSETS,SYSXPLAIN_STATEMENTS where OP_IDENTIFIER = 'HASHJOIN' and SYSXPLAIN_STATEMENTS.STMT_ID = SYSXPLAIN_RESULTSETS.STMT_ID; To couple the scan information for nodes involved in scans - select SEEN_ROWS, RETURNED_ROWS, EST_ROW_COUNT , OP_IDENTIFIER from SYSXPLAIN_STATEMENTS, SYSXPLAIN_RESULTSETS, SYSXPLAIN_SCAN_PROPS where SYSXPLAIN_STATEMENTS.STMT_ID = SYSXPLAIN_RESULTSETS.STMT_ID and SYSXPLAIN_RESULTSETS.SCAN_RS_ID = SYSXPLAIN_SCAN_PROPS.SCAN_RS_ID and OP_IDENTIFIER like '%SCAN'; > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6936) Documentation for changes made as part of DERBY-6904
[ https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046060#comment-16046060 ] Harshvardhan Gupta commented on DERBY-6936: --- The documentation changes seems to cover all the issues we worked on. Looks great to me ! > Documentation for changes made as part of DERBY-6904 > > > Key: DERBY-6936 > URL: https://issues.apache.org/jira/browse/DERBY-6936 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Bryan Pendleton >Priority: Minor > Attachments: docs.diff, rrefsistabs22441.html, rrefsqlj37836.html, > rrefsqlj81859.html > > > Here are the changes I believe we need for this task: > 1) For DERBY-6904, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsqlj81859.html to document the > syntax and behavior of using ALTER TABLE to change the CYCLE option for a > generated column. > 2) For DERBY-6905, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsistabs22441.html to document > the new AUTOINCREMENTCYCLE column that appears in SYSCOLUMNS > 3) For DERBY-6906, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsqlj37836.html to document > that a generated column spec can now explicitly state NO CYCLE > I believe these are all the documentation changes that are necessary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta resolved DERBY-6935. --- Resolution: Fixed > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046056#comment-16046056 ] Harshvardhan Gupta commented on DERBY-6935: --- Yes, the latest commit looks good. I am resolving the issue. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046055#comment-16046055 ] Harshvardhan Gupta commented on DERBY-6938: --- Hi Bryan, I am trying to understand the cardinality estimates Derby Optimizer makes in the presence of joins, I am currently doing more experiments starting from joins for 2 tables and then moving to more levels of joins similar to those present in job query dataset. Along with this I am examining the code and will share the information that I obtain here. Regards, Vardhan, > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045702#comment-16045702 ] Harshvardhan Gupta commented on DERBY-6935: --- Bryan, Yes the 'moreCycleTests.diff' contains the missing tests, I had not synced from remote before adding more tests to IdentitySequenceTests.java, maybe that could have caused problems with the usual patch command. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
[ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6938: -- Attachment: explain.txt Hi Bryan, I am attaching the explain plan of one of the queries from the dataset. I have doubts with regard to the scans performed for INFO_TYPE and TITLE tables and optimizer's estimated row count for those scans. How in general is the estimated row count for scans as part of joins determined? I went through the link below but they do not cover the case when a join is specified http://db.apache.org/derby/docs/10.13/tuning/ctunstats849203.html > Obtain cardinality estimates and true estimates for base tables as well as > for intermediate results for queries involving multiple joins. > --- > > Key: DERBY-6938 > URL: https://issues.apache.org/jira/browse/DERBY-6938 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta > Attachments: explain.txt > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045556#comment-16045556 ] Harshvardhan Gupta commented on DERBY-6935: --- No, the recent changes to IdentitySequenceTests.java do not seem to be included in 1797283. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6936) Documentation for changes made as part of DERBY-6904
[ https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045554#comment-16045554 ] Harshvardhan Gupta commented on DERBY-6936: --- Sure Bryan, please go ahead with the changes. > Documentation for changes made as part of DERBY-6904 > > > Key: DERBY-6936 > URL: https://issues.apache.org/jira/browse/DERBY-6936 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > > Here are the changes I believe we need for this task: > 1) For DERBY-6904, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsqlj81859.html to document the > syntax and behavior of using ALTER TABLE to change the CYCLE option for a > generated column. > 2) For DERBY-6905, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsistabs22441.html to document > the new AUTOINCREMENTCYCLE column that appears in SYSCOLUMNS > 3) For DERBY-6906, we need to update > https://db.apache.org/derby/docs/10.13/ref/rrefsqlj37836.html to document > that a generated column spec can now explicitly state NO CYCLE > I believe these are all the documentation changes that are necessary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045551#comment-16045551 ] Harshvardhan Gupta commented on DERBY-6935: --- Hi Bryan, I added some catalog tests too in IdentitySequenceTests.java which are not included in this commit but included in the patch I attached earlier. Thanks > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Fix For: 10.14.0.0 > > Attachments: simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044865#comment-16044865 ] Harshvardhan Gupta commented on DERBY-6935: --- Bryan, I do think that we should go ahead and commit them. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6935: -- Attachment: simpleTests_2.diff Please find attached patch containing updated catalog tests and tests related to DERBY-6903. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: simpleTests_2.diff, simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6937: -- Attachment: derby_script.sql imdb.diff Please find the attached files. 'derby_script.sql' contains the exact script used by me to set up the tables. The other files contain the changes in ImportReadData.java. There are 2 major changes in ImportReadData - 1) Handling NULL values as discussed by me earlier. 2) Handling the escape characters. Errors I saw related to 2) are discussed here as well - http://apache-database.10148.n7.nabble.com/Data-found-after-the-stop-delimiter-td100312.html. I handled escape character in derby only, other solutions like pre-processing data externally exists as in case of handling NULL values. 'schema_derby.sql' and 'schematext.sql' that came with dataset are mostly same other than the fields where the data type is just 'character varying' without specifying max length. For those columns, I have used 'CLOB' data type in derby as semantically equivalent to that of 'character varying' of undefined length in postgres. You should be able to apply the imdb.diff patch and import the data without any problems now. Thanks, Vardhan > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby_script.sql, imdb.diff, schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036043#comment-16036043 ] Harshvardhan Gupta commented on DERBY-6937: --- Hi Bryan, I modified the derby code to check for NULL string during the file reading logic and ended up creating a variant of SYSCS_IMPORT_TABLE with minimal changes, yes there are a lot of ways to get around this problem, I found this approach quicker and cleaner in my current environment. Specifically in ImportReadData.java, I checked the parsed column value in readNextDelimitedRow procedure and set it null if it matched the string "NULL". I agree that there are known workarounds for this problem and it is not the focus of our project, let our documentation help someone in future trying to set up the environment for analyzing Derby's optimizer. > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036035#comment-16036035 ] Harshvardhan Gupta commented on DERBY-6935: --- Hey Bryan, Yes the tests look good to me and capture the expected behaviour, I'll take your patch and add some more, particularly tests related to DERBY-6903 and DERBY-6905. thanks, vardhan > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: simpleTests.diff > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6937: -- Attachment: schema_derby.sql I have completed work on this by ingesting the same data used in the original paper in my local derby env. I am also attaching the schema that is going to be used in derby (adapted for derby) to run the queries. IMDB Data Snapshot used in the original VLDB paper - http://homepages.cwi.nl/~boncz/job/imdb.tgz The license and links to the current version IMDB data set can be found at http://www.imdb.com/interfaces While using SYSCS_IMPORT_TABLE to ingest the data, I got problem with handling NULL values, Derby expects null values to be blank in CSV, however when keyword "NULL" is present, it reads in character sequence as is for char data types and throws error in case of non character datatypes such as INTEGER, NUMERIC etc. I got around the problem using a quick hack. Should we create a new variant of SYSCS_IMPORT_TABLE to take in an additional argument which will specify the string used to represent NULL in CSV dump? Or should we rather work on making existing procedures var-args rather than adding a new procedure. I tried making the existing procedure as var-args but ran into same problem as discussed in this JIRA - https://issues.apache.org/jira/browse/DERBY-4555 > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
Harshvardhan Gupta created DERBY-6938: - Summary: Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins. Key: DERBY-6938 URL: https://issues.apache.org/jira/browse/DERBY-6938 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Assignee: Harshvardhan Gupta -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
Harshvardhan Gupta created DERBY-6937: - Summary: Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby Key: DERBY-6937 URL: https://issues.apache.org/jira/browse/DERBY-6937 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Assignee: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031210#comment-16031210 ] Harshvardhan Gupta commented on DERBY-6904: --- Sure, I think it is good to be committed. I will be adding the test cases this week. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, > columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028473#comment-16028473 ] Harshvardhan Gupta edited comment on DERBY-6904 at 5/29/17 3:55 PM: Thanks for reviewing the patch. Your comments are much appreciated. As you rightly pointed out, cycling feature requires only boolean for storage and processing. The new changes are attached in 6904_4.diff which uses boolean data type. Also I have made changes to IdentitySequenceTest by changing them to reflect the new expected behaviour. Regarding SYSSEQUNCES, yes it is mostly untouched, a new sequences is created when we change the cycling behaviour. (which also copies other autoincrement info such as increment value, current value etc. from existing sequence) Other than two of the above test cases, we should add new test cases to completely cover all the functionality introduced with this patch. I am creating new subtasks for tracking test coverage and documentation. For test coverage, let us discuss on the following subtask - https://issues.apache.org/jira/browse/DERBY-6935 was (Author: harshvardhan145): Thanks for reviewing the patch. Your comments are much appreciated. As you rightly pointed out, cycling feature requires only boolean for storage and processing. The new changes are attached in 6904_4.diff which uses boolean data type. Also I have made changes to IdentitySequenceTest by changing them to reflect the new expected behaviour. Other than two of the above test cases, we should add new test cases to completely cover all the functionality introduced with this patch. I am creating new subtasks for tracking test coverage and documentation. For test coverage, let us discuss on the following subtask - https://issues.apache.org/jira/browse/DERBY-6935 > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, > columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028482#comment-16028482 ] Harshvardhan Gupta commented on DERBY-6935: --- Hey Bryan, Since most of the functionality we are testing is closely knit together, I think it is better if we write tests independently rather than trying to divide the work and later merge them, how about I do the first iteration of all the tests and then you can perhaps add some more tricker tests or refine them as part of the second iteration if it's required. Or maybe vice versa as well if that suits. Thanks > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6935) Test Coverage for added features.
[ https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028480#comment-16028480 ] Harshvardhan Gupta commented on DERBY-6935: --- The following functionality should be covered as part of writing tests for the patch - 1) SYS.SYSCOLUMNS now reports the cycle value for the Identity column. 2) Alter table alter column command doesn't have any effect on the cycle column.(Two of the tests fixed as part of patch already cover this, I am sure we can add some more trickier test cases) 3) Explicitly turn off cycling using NO CYCLE command during create table or alter table queries. > Test Coverage for added features. > - > > Key: DERBY-6935 > URL: https://issues.apache.org/jira/browse/DERBY-6935 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028473#comment-16028473 ] Harshvardhan Gupta edited comment on DERBY-6904 at 5/29/17 3:40 PM: Thanks for reviewing the patch. Your comments are much appreciated. As you rightly pointed out, cycling feature requires only boolean for storage and processing. The new changes are attached in 6904_4.diff which uses boolean data type. Also I have made changes to IdentitySequenceTest by changing them to reflect the new expected behaviour. Other than two of the above test cases, we should add new test cases to completely cover all the functionality introduced with this patch. I am creating new subtasks for tracking test coverage and documentation. For test coverage, let us discuss on the following subtask - https://issues.apache.org/jira/browse/DERBY-6935 was (Author: harshvardhan145): Thanks for reviewing the patch. Your comments are much appreciated. As you rightly pointed out, cycling feature requires only boolean for storage and processing. The new changes are attached in 6904_4.diff which uses boolean data type. Also I have made changes to IdentitySequenceTest by changing them to reflect the new expected behaviour. Other than two of the above test cases, we should add new test cases to completely cover all the functionality introduced with this patch. I am creating new subtasks for tracking test coverage and documentation. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, > columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DERBY-6936) Documentation for changes made as part of DERBY-6904
[ https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6936: - Assignee: Harshvardhan Gupta > Documentation for changes made as part of DERBY-6904 > > > Key: DERBY-6936 > URL: https://issues.apache.org/jira/browse/DERBY-6936 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DERBY-6936) Documentation for changes made as part of DERBY-6904
Harshvardhan Gupta created DERBY-6936: - Summary: Documentation for changes made as part of DERBY-6904 Key: DERBY-6936 URL: https://issues.apache.org/jira/browse/DERBY-6936 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DERBY-6935) Test Coverage for added features.
Harshvardhan Gupta created DERBY-6935: - Summary: Test Coverage for added features. Key: DERBY-6935 URL: https://issues.apache.org/jira/browse/DERBY-6935 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Assignee: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6904: -- Attachment: 6904_4.diff Thanks for reviewing the patch. Your comments are much appreciated. As you rightly pointed out, cycling feature requires only boolean for storage and processing. The new changes are attached in 6904_4.diff which uses boolean data type. Also I have made changes to IdentitySequenceTest by changing them to reflect the new expected behaviour. Other than two of the above test cases, we should add new test cases to completely cover all the functionality introduced with this patch. I am creating new subtasks for tracking test coverage and documentation. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, > columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DERBY-6903) ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6903: - Assignee: Harshvardhan Gupta > ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column > --- > > Key: DERBY-6903 > URL: https://issues.apache.org/jira/browse/DERBY-6903 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > > Using ALTER TABLE with the SET INCREMENT BY or SET RESTART WITH > options to make a change to a GENERATED ... BY IDENTITY column resets the > CYCLE option of that column to "off". > Some test cases demonstrating this behavior were added as part of > the DERBY-6852 change revision 1756287 > The problem occurs when the ALTER TABLE code drops and recreates > the underlying SEQUENCE object, and does not preserve the CYCLE > option of the underlying SEQUENCE object. > I believe the correct behavior would be for the ALTER TABLE code > to read the current CYCLE option of the underlying SEQUENCE object > first, and then create the new SEQUENCE object with the same value > for the CYCLE option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DERBY-6906) Allow a GENERATED column to explicitly specify NO CYCLE
[ https://issues.apache.org/jira/browse/DERBY-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6906: - Assignee: Harshvardhan Gupta > Allow a GENERATED column to explicitly specify NO CYCLE > --- > > Key: DERBY-6906 > URL: https://issues.apache.org/jira/browse/DERBY-6906 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > > DERBY-6852 added the ability for a GENERATED IDENTITY column > to specify that it should have CYCLE behavior, utilizing the underlying > CYCLE behavior of the SEQUENCE object. > A column which does not wish to have CYCLE behavior is specified > by omitting the the CYCLE keyword. > It would be nice to allow a GENERATED IDENTITY column to > explicitly specify that it should have NO CYCLE behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6905: - Assignee: Harshvardhan Gupta > Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column > --- > > Key: DERBY-6905 > URL: https://issues.apache.org/jira/browse/DERBY-6905 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > > DERBY-6542 changed IDENTITY columns to build upon the underlying > SEQUENCE objects. SEQUENCE objects can have a CYCLE option > set for them, and as of DERBY-6852, the CYCLE option controls the > behavior of the IDENTITY column that uses that SEQUENCE. > It would be nice if SYS.SYSCOLUMNS would report the CYCLE option > that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6906) Allow a GENERATED column to explicitly specify NO CYCLE
[ https://issues.apache.org/jira/browse/DERBY-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6906: -- Issue Type: Sub-task (was: Improvement) Parent: DERBY-6904 > Allow a GENERATED column to explicitly specify NO CYCLE > --- > > Key: DERBY-6906 > URL: https://issues.apache.org/jira/browse/DERBY-6906 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > > DERBY-6852 added the ability for a GENERATED IDENTITY column > to specify that it should have CYCLE behavior, utilizing the underlying > CYCLE behavior of the SEQUENCE object. > A column which does not wish to have CYCLE behavior is specified > by omitting the the CYCLE keyword. > It would be nice to allow a GENERATED IDENTITY column to > explicitly specify that it should have NO CYCLE behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6905: -- Issue Type: Sub-task (was: Improvement) Parent: DERBY-6904 > Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column > --- > > Key: DERBY-6905 > URL: https://issues.apache.org/jira/browse/DERBY-6905 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > > DERBY-6542 changed IDENTITY columns to build upon the underlying > SEQUENCE objects. SEQUENCE objects can have a CYCLE option > set for them, and as of DERBY-6852, the CYCLE option controls the > behavior of the IDENTITY column that uses that SEQUENCE. > It would be nice if SYS.SYSCOLUMNS would report the CYCLE option > that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6903) ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column
[ https://issues.apache.org/jira/browse/DERBY-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6903: -- Issue Type: Sub-task (was: Bug) Parent: DERBY-6904 > ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column > --- > > Key: DERBY-6903 > URL: https://issues.apache.org/jira/browse/DERBY-6903 > Project: Derby > Issue Type: Sub-task > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > > Using ALTER TABLE with the SET INCREMENT BY or SET RESTART WITH > options to make a change to a GENERATED ... BY IDENTITY column resets the > CYCLE option of that column to "off". > Some test cases demonstrating this behavior were added as part of > the DERBY-6852 change revision 1756287 > The problem occurs when the ALTER TABLE code drops and recreates > the underlying SEQUENCE object, and does not preserve the CYCLE > option of the underlying SEQUENCE object. > I believe the correct behavior would be for the ALTER TABLE code > to read the current CYCLE option of the underlying SEQUENCE object > first, and then create the new SEQUENCE object with the same value > for the CYCLE option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6904: -- Attachment: 6904_3.diff I have completed work on the hard upgrade strategy with the attached patch 6904_3.diff. All the test suites now pass with soft upgrades and hard upgrades. I also manually tested the expected behaviour using ij. I now have to add test cases for the newly added functionality. The patch resolves the following related issues : DERBY-6903, DERBY-6904, DERBY-6905 and DERBY-6906 > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: 6904_2.diff, 6904_3.diff, 6904.diff, columnMismatch, > compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DERBY-6921) How good is the Derby Query Optimizer, really
[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6921: - Assignee: Harshvardhan Gupta > How good is the Derby Query Optimizer, really > - > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL >Reporter: Bryan Pendleton >Assignee: Harshvardhan Gupta >Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to >analyze the results. > 3a) Some of the benchmark queries may reveal opportunities >for Derby to improve the query plans that it chooses for >various classes of queries (this is explained in detail in the >VLDB paper and other information available at Dr. Leis's site) >For each such improvement, we need to isolate the issue, >report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does >it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding >the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it >appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar >with DBMS systems, and with SQL, and in particular with >cost-based query optimization. If terms such as "cardinality >estimation", "correlated query predicates", or "bushy trees" >aren't comfortable terms for you ,this probably isn't the >project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage >of the extensive body of introductory material on Derby to >become familiar with it: read the Derby Getting Started manual, >download the software and follow the tutorials, read the documentation, >download the source code and learn how to build and run the >test suites, etc. > 3) All I have presented here is an **outline** of the project. You will >need to read the paper(s), study the benchmark queries, and >propose a detailed plan for how to use this benchmark as a tool >for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really
[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943436#comment-15943436 ] Harshvardhan Gupta commented on DERBY-6921: --- Thanks for reviewing the proposal. I'll make sure to thoroughly go into the above mentioned resources and past GSoC proposals to further refine my proposal. > How good is the Derby Query Optimizer, really > - > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL >Reporter: Bryan Pendleton >Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to >analyze the results. > 3a) Some of the benchmark queries may reveal opportunities >for Derby to improve the query plans that it chooses for >various classes of queries (this is explained in detail in the >VLDB paper and other information available at Dr. Leis's site) >For each such improvement, we need to isolate the issue, >report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does >it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding >the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it >appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar >with DBMS systems, and with SQL, and in particular with >cost-based query optimization. If terms such as "cardinality >estimation", "correlated query predicates", or "bushy trees" >aren't comfortable terms for you ,this probably isn't the >project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage >of the extensive body of introductory material on Derby to >become familiar with it: read the Derby Getting Started manual, >download the software and follow the tutorials, read the documentation, >download the source code and learn how to build and run the >test suites, etc. > 3) All I have presented here is an **outline** of the project. You will >need to read the paper(s), study the benchmark queries, and >propose a detailed plan for how to use this benchmark as a tool >for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really
[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942086#comment-15942086 ] Harshvardhan Gupta commented on DERBY-6921: --- Hi Kathey, please find the draft proposal at https://docs.google.com/document/d/18UzZt7GGuYX_ww_o9kPwdZH0eftYIMRxfJQ3fcKe0ek/edit?usp=sharing I'll make sure to update the Derby Wiki once I submit the final proposal. > How good is the Derby Query Optimizer, really > - > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL >Reporter: Bryan Pendleton >Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to >analyze the results. > 3a) Some of the benchmark queries may reveal opportunities >for Derby to improve the query plans that it chooses for >various classes of queries (this is explained in detail in the >VLDB paper and other information available at Dr. Leis's site) >For each such improvement, we need to isolate the issue, >report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does >it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding >the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it >appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar >with DBMS systems, and with SQL, and in particular with >cost-based query optimization. If terms such as "cardinality >estimation", "correlated query predicates", or "bushy trees" >aren't comfortable terms for you ,this probably isn't the >project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage >of the extensive body of introductory material on Derby to >become familiar with it: read the Derby Getting Started manual, >download the software and follow the tutorials, read the documentation, >download the source code and learn how to build and run the >test suites, etc. > 3) All I have presented here is an **outline** of the project. You will >need to read the paper(s), study the benchmark queries, and >propose a detailed plan for how to use this benchmark as a tool >for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really
[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942035#comment-15942035 ] Harshvardhan Gupta commented on DERBY-6921: --- Hi Derby community, I have submitted a draft proposal for this project through google. Please review it and help me further refine it. Please comment if you are not able to access it through google. Regards, Harshvardhan Gupta > How good is the Derby Query Optimizer, really > - > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL >Reporter: Bryan Pendleton >Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to >analyze the results. > 3a) Some of the benchmark queries may reveal opportunities >for Derby to improve the query plans that it chooses for >various classes of queries (this is explained in detail in the >VLDB paper and other information available at Dr. Leis's site) >For each such improvement, we need to isolate the issue, >report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does >it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding >the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it >appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar >with DBMS systems, and with SQL, and in particular with >cost-based query optimization. If terms such as "cardinality >estimation", "correlated query predicates", or "bushy trees" >aren't comfortable terms for you ,this probably isn't the >project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage >of the extensive body of introductory material on Derby to >become familiar with it: read the Derby Getting Started manual, >download the software and follow the tutorials, read the documentation, >download the source code and learn how to build and run the >test suites, etc. > 3) All I have presented here is an **outline** of the project. You will >need to read the paper(s), study the benchmark queries, and >propose a detailed plan for how to use this benchmark as a tool >for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907861#comment-15907861 ] Harshvardhan Gupta edited comment on DERBY-6904 at 3/13/17 5:05 PM: Thanks Bryan, Soft Upgrade Strategy - In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all columns except the newly introduced one while running a database in soft upgrade mode. (i.e both forward and backward compatible). The code flow was modelled after Derby-534 for introducing a new in column SYSTRIGGERS.. Since most test suites used only soft upgrade mode, I passed all the test suites except the upgradeTests package which explicitly boots the old databases in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard upgrade strategy. Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS as part of explicit hard upgrade? The updated patch passed all the test suites other than upgradeTests test suite where some tests require a hard update strategy. I also tested soft updates using IJ for database created using 10.13, the created databases are both backward and forward compatible with 10.14. Additionally all the update logic is contained in DD_VERSION file. Please let me know if the current soft strategy I am using is not a desired one. was (Author: harshvardhan145): Thanks Bryan, Soft Upgrade Strategy - In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all columns except the newly introduced one while running a database in soft upgrade mode. (i.e both forward and backward compatible). The code flow was modelled after Derby-534 for introducing a new in column SYSTRIGGERS.. Since most test suites used only soft upgrade mode, I passed all the test suites except the upgradeTests package which explicitly boots the old databases in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard upgrade strategy. Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS as part of explicit hard upgrade? The updated patch passed all the test suites other than upgradeTests test suite where some tests require a hard update strategy. I also tested soft updates using IJ for database created using 10.13, the created databases are both backward and forward compatible with 10.14. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904_2.diff, 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6904: -- Attachment: 6904_2.diff Thanks Bryan, Soft Upgrade Strategy - In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all columns except the newly introduced one while running a database in soft upgrade mode. (i.e both forward and backward compatible). The code flow was modelled after Derby-534 for introducing a new in column SYSTRIGGERS.. Since most test suites used only soft upgrade mode, I passed all the test suites except the upgradeTests package which explicitly boots the old databases in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard upgrade strategy. Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS as part of explicit hard upgrade? The updated patch passed all the test suites other than upgradeTests test suite where some tests require a hard update strategy. I also tested soft updates using IJ for database created using 10.13, the created databases are both backward and forward compatible with 10.14. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904_2.diff, 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906477#comment-15906477 ] Harshvardhan Gupta edited comment on DERBY-6904 at 3/12/17 10:14 AM: - Update on previous comment - I tested with various scenarios by creating database without and with the new changes using IJ. Whenever I try to access any of the databases created without my changes through the patched version I get this error - "XSCH5: In a base table there was a mismatch between the requested column number 9 and the maximum number of columns 9.". Note that we increased the number of columns in SYSCOLUMNs from 9 to 10 in this patch. Should we incorporate a update strategy when older databases are accessed by this patched version? was (Author: harshvardhan145): Update on previous comment - I tested with various scenarios by creating database without and with the new changes using IJ. Whenever I try to access any of the databases created without my changes through the patched version I get this error - "In a base table there was a mismatch between the requested column number 9 and the maximum number of columns 9.". Note that we increased the number of columns in SYSCOLUMNs from 9 to 10 in this patch. Should we incorporate a update strategy when older databases are accessed by this patched version? > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906477#comment-15906477 ] Harshvardhan Gupta commented on DERBY-6904: --- Update on previous comment - I tested with various scenarios by creating database without and with the new changes using IJ. Whenever I try to access any of the databases created without my changes through the patched version I get this error - "In a base table there was a mismatch between the requested column number 9 and the maximum number of columns 9.". Note that we increased the number of columns in SYSCOLUMNs from 9 to 10 in this patch. Should we incorporate a update strategy when older databases are accessed by this patched version? > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906294#comment-15906294 ] Harshvardhan Gupta edited comment on DERBY-6904 at 3/11/17 7:21 PM: Some of the existing tests are failing with one of the following reasons - 1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I think this error is mostly due to addition of a new column in SYSCOLUMNS which is not there in existing DB jars when they are brought to memory in test packages. (XSCH5 is a heap class SQLState) 2) compareDatabaseNames throws mismatch error while building a connection. I think these errors are a direct result of the above error when shutdown is not clean. I am attaching both the error stacks for reference. Please take a look at the error stacks and give your inputs. Also it would be helpful if you clarify the process of reading a jar/disk database and its relationship with the heap structures of derby. was (Author: harshvardhan145): Some of the existing tests are failing with one of the following reasons - 1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I think this error is mostly due to addition of a new column in SYSCOLUMNS which is not there in existing DB jars when they are brought to memory in test packages. (HSCH5 is a heap class SQLState) 2) compareDatabaseNames throws mismatch error while building a connection. I think these errors are a direct result of the above error when shutdown is not clean. I am attaching both the error stacks for reference. Please take a look at the error stacks and give your inputs. Also it would be helpful if you clarify the process of reading a jar/disk database and its relationship with the heap structures of derby. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906294#comment-15906294 ] Harshvardhan Gupta edited comment on DERBY-6904 at 3/11/17 7:20 PM: Some of the existing tests are failing with one of the following reasons - 1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I think this error is mostly due to addition of a new column in SYSCOLUMNS which is not there in existing DB jars when they are brought to memory in test packages. (HSCH5 is a heap class SQLState) 2) compareDatabaseNames throws mismatch error while building a connection. I think these errors are a direct result of the above error when shutdown is not clean. I am attaching both the error stacks for reference. Please take a look at the error stacks and give your inputs. Also it would be helpful if you clarify the process of reading a jar/disk database and its relationship with the heap structures of derby. was (Author: harshvardhan145): Some of the existing tests are failing with one of the following reasons - 1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I think this error is mostly due to addition of a new column in SYSCOLUMNS which is not there in existing DB jars when they are brought to memory in test packages. (HSCH5 is a heap class SQLState) 2) compareDatabaseNames throws mismatch error while building a connection. I think these errors are a direct result of the above error when shutdown is not clean. I am attaching both the error stacks for reference. Please take a look at the error stacks and give your inputs. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6904: -- Attachment: compareDbName columnMismatch Some of the existing tests are failing with one of the following reasons - 1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I think this error is mostly due to addition of a new column in SYSCOLUMNS which is not there in existing DB jars when they are brought to memory in test packages. (HSCH5 is a heap class SQLState) 2) compareDatabaseNames throws mismatch error while building a connection. I think these errors are a direct result of the above error when shutdown is not clean. I am attaching both the error stacks for reference. Please take a look at the error stacks and give your inputs. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff, columnMismatch, compareDbName > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902720#comment-15902720 ] Harshvardhan Gupta commented on DERBY-6904: --- Thanks Bryan for the inputs. I will be adding the test cases and also the patch for DERBY-6906 (Allow a GENERATED column to explicitly specify NO CYCLE) in a couple of days. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.
[ https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta updated DERBY-6904: -- Attachment: 6904.diff The attached patch covers the following issues - DERBY-6904 Enhance ALTER TABLE to allow CYCLE option to be changed. DERBY-6905 Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column DERBY-6903 ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column The idea was to add the CYCLE property as a fundamental property associated with a column similar to current increment value, current increment step etc. - SYS.SYSCOLUMNS extended to hold the cycle value of identity column. - Extended the grammar to accept "cycle","no cycle" in alter clause and changed the underlying column descriptor and sequencing objects. - ALTER clause not involving cycle option (i.e increment and restart) won't affect the cycle option. I have manually tested with different queries for all the above scenarios. I will be writing test cases over coming days and fixing older one which are exepected to break due to changes in SYSCOLUMNS view. Also, while testing I encountered the following behaviour present in the latest derby release: If increment step is changed, the current value of syssequences is not changed however current value in syscolumns is changed. Interestingly, this only affect the SYSCOLUMNS view and not the generation of sequences since SYSSEQUENCES is consistent. I found DERBY-6579 which addressed the bug and made SYSSEQUENCES consistent. Please clarify whether the SYSCOLUMNS behaviour is a deliberate one or should be filed as a bug. > Enhance ALTER TABLE to allow CYCLE option to be changed. > > > Key: DERBY-6904 > URL: https://issues.apache.org/jira/browse/DERBY-6904 > Project: Derby > Issue Type: Improvement > Components: SQL >Affects Versions: 10.13.1.0 >Reporter: Bryan Pendleton >Priority: Minor > Attachments: 6904.diff > > > It would be nice to have a variant of the ALTER TABLE statement, > similar to the SET INCREMENT BY or SET RESTART WITH variants, > that allowed a user to change the CYCLE option on an IDENTITY column. > I think that the code flow for this could be closely modelled on the > code flow for the SET INCREMENT BY variant: after the statement > is parsed, we simply need to drop and recreate the underlying SEQUENCE > object with the desired CYCLE option (and preserving all the other > aspects of the underlying SEQUENCE object. -- This message was sent by Atlassian JIRA (v6.3.15#6346)