[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-08-25 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141740#comment-16141740
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

Bryan,

I am summarizing the remaining work required on releasing the changes done as 
part of Derby-6940 and Derby-6942.

Test cases are required for addition of new statistics and the integration of 
integer data type with the additional new stats. Also the patch doesn't contain 
documentation changes that need to be done for each of the above new 
functionalities.

Also, what do you think of integrating additional data types such as varchar 
etc and releasing all the changes together or doing it iteratively over time? 

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6949) Analysis of Derby Cost Estimation

2017-08-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113150#comment-16113150
 ] 

Harshvardhan Gupta edited comment on DERBY-6949 at 8/3/17 5:36 PM:
---

By Cost we generally refer to any number proportional to the execution time, in 
Derby Cardinality estimates are also clubbed in the bigger cost structure.

Cost modelling for Join Strategies generally 

1) Nested Loop - After finding out the cost of a single probe in the inner 
resultset, nested loop multiplies the cost with the number of rows in outer 
resultset. The cost of these resultsets come from cost modelling by 
Optimizables which also serve as entities in cost estimation. For Example - Two 
base tables joined via a loop join serve as Optimizables where the outer table 
reports its scanning cost and the inner table reports the cost of lookup of a 
single row matching the outer row. Note that even though the cost is 
multiplied, the cardinality estimates are not simply multiplied for the two 
base tables (or intermediate results)

2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each 
outer row in case of hash joins and the cost for hash joins are only dependent 
on the base tables / intermediate results (i.e called optimizables). Thus at 
the join level no further cost changes are made.


was (Author: harshvardhan145):
By Cost we generally refer to any number proportional to the execution time, in 
Derby Cardinality estimates are also clubbed in the bigger cost structure.

Cost modelling for Join Strategies generally 

1) Nested Loop - After finding out the cost of a single probe in the inner 
resultset, nested loop multiplies the cost with the number of rows in outer 
resultset. The cost of these resultsets come from cost modelling by 
Optimizables which also serve as entities in cost estimation. For Example - Two 
base tables joined via a loop join serve as Optimizables where the outer table 
reports its scanning cost and the inner table reports the cost of lookup of a 
single row matching the outer row. Note that even though the cost is 
multiplied, the cardinality estimates are not simply multiplied for the two 
base tables.

2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each 
outer row in case of hash joins and the cost for hash joins are only dependent 
on the base tables / intermediate results (i.e called optimizables). Thus at 
the join level no further cost changes are made.

> Analysis of Derby Cost Estimation
> -
>
> Key: DERBY-6949
> URL: https://issues.apache.org/jira/browse/DERBY-6949
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation

2017-08-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113150#comment-16113150
 ] 

Harshvardhan Gupta commented on DERBY-6949:
---

By Cost we generally refer to any number proportional to the execution time, in 
Derby Cardinality estimates are also clubbed in the bigger cost structure.

Cost modelling for Join Strategies generally 

1) Nested Loop - After finding out the cost of a single probe in the inner 
resultset, nested loop multiplies the cost with the number of rows in outer 
resultset. The cost of these resultsets come from cost modelling by 
Optimizables which also serve as entities in cost estimation. For Example - Two 
base tables joined via a loop join serve as Optimizables where the outer table 
reports its scanning cost and the inner table reports the cost of lookup of a 
single row matching the outer row. Note that even though the cost is 
multiplied, the cardinality estimates are not simply multiplied for the two 
base tables.

2) Hash Join - As opposed to Nested loop joins, there is no fixed cost for each 
outer row in case of hash joins and the cost for hash joins are only dependent 
on the base tables / intermediate results (i.e called optimizables). Thus at 
the join level no further cost changes are made.

> Analysis of Derby Cost Estimation
> -
>
> Key: DERBY-6949
> URL: https://issues.apache.org/jira/browse/DERBY-6949
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation

2017-08-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113109#comment-16113109
 ] 

Harshvardhan Gupta commented on DERBY-6949:
---

The whole lifecycle of Query Optimization in Derby is as follows - 

Optimizer Interface's implementation OptimizerImpl.java is responsible to 
iterate through all permitted semantically equivalent execution plans and 
orchestrating calls to the various entities that provide their 'estimated cost' 
of execution considering the set of constraints that will be associated with 
them during that particular plan.

Before getting into the details of the above mentioned entities and their 
estimated cost, it is important to note that the search space of all the 
possible semantically equivalent plans is limited to only left deep trees. The 
selection of join order also influences the way we want to constraint the 
execution. For example - A left deep HJ has to wait for each join to finished 
completely so that the result set can be hashed before  the next step can probe 
it, on the other hand in a right deep HJ, it is not the intermediate results 
that are being hashed but the base tables. So, Right deep HJ trees use 
substantially more memory / disk spillovers. 

It has been shown in the [VLDB Optimizer 
Paper|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] (Section 6.2) that left 
deep trees perform quite reasonably while restricting search space and making 
the Optimizer simple and fast as opposed to right deep trees where disk spills 
eat away a lot of performance. The author further emphasize on cardinality 
estimates as opposed to search space which may not offer significant 
performance benefits. This area definitely needs to be further explored in 
Derby's context.

Coming back to the entities for cost estimation, they are Join Strategy, 
Optimizables and Ordering. Each of these implement the 'estimateCost' function 
and gives out their estimated cost of execution to the Optimizer. Optimizer 
permutes through the various combinations and produces the best possible 
execution plan it is able to find.

> Analysis of Derby Cost Estimation
> -
>
> Key: DERBY-6949
> URL: https://issues.apache.org/jira/browse/DERBY-6949
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DERBY-6954) Allow plans to execute where HASH tables are estimated to spill to disk.

2017-08-02 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6954:
-

Assignee: Harshvardhan Gupta

> Allow plans to execute where HASH tables are estimated to spill to disk.
> 
>
> Key: DERBY-6954
> URL: https://issues.apache.org/jira/browse/DERBY-6954
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>
> Derby Optimizer currently rejects the plans where it estimates that 
> intermediate HASH Tables in HASH Joins will spill to disk. We need to model 
> the cost associated of HASH Join disk spill in order to make them a candidate 
> for optimal plan. DERBY-6938, DERBY-1259 and DERBY-106 contain some useful 
> details around this behaviour.
> One possible workaround around this behaviour is to explicitly specify 
> optimizer hints and overriding the default optimizer behaviour.
> https://db.apache.org/derby/docs/10.13/tuning/ctunoptimzoverride.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DERBY-6954) Allow plans to execute where HASH tables are estimated to spill to disk.

2017-08-02 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6954:
-

 Summary: Allow plans to execute where HASH tables are estimated to 
spill to disk.
 Key: DERBY-6954
 URL: https://issues.apache.org/jira/browse/DERBY-6954
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta


Derby Optimizer currently rejects the plans where it estimates that 
intermediate HASH Tables in HASH Joins will spill to disk. We need to model the 
cost associated of HASH Join disk spill in order to make them a candidate for 
optimal plan. DERBY-6938, DERBY-1259 and DERBY-106 contain some useful details 
around this behaviour.

One possible workaround around this behaviour is to explicitly specify 
optimizer hints and overriding the default optimizer behaviour.
https://db.apache.org/derby/docs/10.13/tuning/ctunoptimzoverride.html




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-08-02 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110605#comment-16110605
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Bryan,

I agree with your observations. For production of efficient query plans, 
Cardinality Estimates, Cost estimation and Plan space are all inter dependent 
on each other. I am filing another issue under DERBY-6949 to track the effort 
on modelling HASH join's spill behaviour. I will also update cost modelling 
details under DERBY-6949 shortly after I had to take a detour analyzing HASH 
Join's behaviour.

Thanks,
Vardhan

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt, traceout.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-07-22 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097272#comment-16097272
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Summarizing  my analysis of HASH based joins so far -

Derby has the capability to perform HASH Joins spilling to disk but has long 
ignored them due to the absence of a cost model around such a behaviour. 
DERBY-1259 also talks about the history of HASH Joins and the changes made as 
part of DERBY-106.

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt, traceout.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-07-22 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6938:
--
Attachment: traceout.txt

Following my previous comment I am also attaching the sample optimizer trace of 
Query 1a of job dataset where Optimizer ignores the HASH Tables which it 
predicts will spill to disk.

trace output String - "Skipping access path due to excess memory usage, maximum 
is 1048576".

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt, traceout.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-07-22 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097260#comment-16097260
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Hi Bryan,

The Optimizer produces memory and cost estimates for all the possible access 
paths that it can build. Derby will only consider those HASH Joins to go 
through which it _predicts _ will fit into memory and rejects all others 
(including those which will be very efficient for some queries as opposed to 
loop based joins).

However, the memory predictions of Derby may not match memory requirement at 
execution and HASH table may become larger than what Derby predicted. DERBY-106 
was an effort to mitigate the OOM exceptions arising out of this behaviour. So 
although HASH tables do spill to disk but only for those access path go forward 
to execution where Derby predicts that the HASH Tables will not spill to disk.

The above observations of mine match with the comment on DERBY-1259 which was 
written a decade ago.

Coming to the resolution of this problem, we need to investigate how to measure 
the cost of execution when Derby will eventually allow those HASH based access 
paths that it predicts will spill to disk instead of simply ignoring them.

Also, interesting is the observation that Derby currently allows hash based 
joins when users specify their join strategies via Query Hints. 


>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-07-22 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097260#comment-16097260
 ] 

Harshvardhan Gupta edited comment on DERBY-6938 at 7/22/17 11:41 AM:
-

Hi Bryan,

The Optimizer produces memory and cost estimates for all the possible access 
paths that it can build. Derby will only consider those HASH Joins to go 
through which it _predicts_ will fit into memory and rejects all others 
(including those which will be very efficient for some queries as opposed to 
loop based joins).

However, the memory predictions of Derby may not match memory requirement at 
execution and HASH table may become larger than what Derby predicted. DERBY-106 
was an effort to mitigate the OOM exceptions arising out of this behaviour. So 
although HASH tables do spill to disk but only for those access path go forward 
to execution where Derby predicts that the HASH Tables will not spill to disk.

The above observations of mine match with the comment on DERBY-1259 which was 
written a decade ago.

Coming to the resolution of this problem, we need to investigate how to measure 
the cost of execution when Derby will eventually allow those HASH based access 
paths that it predicts will spill to disk instead of simply ignoring them.

Also, interesting is the observation that Derby currently allows hash based 
joins when users specify their join strategies via Query Hints. 



was (Author: harshvardhan145):
Hi Bryan,

The Optimizer produces memory and cost estimates for all the possible access 
paths that it can build. Derby will only consider those HASH Joins to go 
through which it _predicts _ will fit into memory and rejects all others 
(including those which will be very efficient for some queries as opposed to 
loop based joins).

However, the memory predictions of Derby may not match memory requirement at 
execution and HASH table may become larger than what Derby predicted. DERBY-106 
was an effort to mitigate the OOM exceptions arising out of this behaviour. So 
although HASH tables do spill to disk but only for those access path go forward 
to execution where Derby predicts that the HASH Tables will not spill to disk.

The above observations of mine match with the comment on DERBY-1259 which was 
written a decade ago.

Coming to the resolution of this problem, we need to investigate how to measure 
the cost of execution when Derby will eventually allow those HASH based access 
paths that it predicts will spill to disk instead of simply ignoring them.

Also, interesting is the observation that Derby currently allows hash based 
joins when users specify their join strategies via Query Hints. 


>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082789#comment-16082789
 ] 

Harshvardhan Gupta edited comment on DERBY-6942 at 7/11/17 7:16 PM:


During the local tests I did, I tested the selectivity outputs for relational 
operators of the following types : (>=, <=, > , <) and integer data type with 
the latest patch.

Regarding the examples Bryan mentioned, I am not quite sure about this but I 
believe 'between' operator is internally transformed into one of these 
relational ops, in that case the latest patch is good for 'between' op also.

UPDATE:  between transformation seem to work with the patch and the selectivity 
is the product of individual selectivities of LESS_THAN and GREATER_THAN 
operator types.


was (Author: harshvardhan145):
During the local tests I did, I tested the selectivity outputs for relational 
operators of the following types : (>=, <=, > , <) and integer data type with 
the latest patch.

Regarding the examples Bryan mentioned, I am not quite sure about this but I 
believe 'between' operator is internally transformed into one of these 
relational ops, in that case the latest patch is good for 'between' op also.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082789#comment-16082789
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

During the local tests I did, I tested the selectivity outputs for relational 
operators of the following types : (>=, <=, > , <) and integer data type with 
the latest patch.

Regarding the examples Bryan mentioned, I am not quite sure about this but I 
believe 'between' operator is internally transformed into one of these 
relational ops, in that case the latest patch is good for 'between' op also.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-11 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6942:
--
Attachment: DERBY-6942_2.diff

Hi Bryan,

Attaching the new code based on your suggestions. Currently, estimateProportion 
is implemented for SQLINTEGER, other data types' selectivity will be calculated 
using existing hardwired values unless we chose to override the 
estimateProportion function for each of them.

Vardhan



> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942_2.diff, DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-08 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079256#comment-16079256
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

I experimented with integrating max and min values similar to null values. 
Unlike a straightforward null count, the primary challenge associated with this 
problem is that we are going to need to define a distance metric with each of 
the sql types in derby.

For each example we need a procedure associate with varchar datatype which will 
tell derby that a string starting with 'L' is approximately equally far away 
from string startings with 'A' and 'Z'.(and thus give away a selectivity of 50% 
instead of relying on hardwired estimates)

For some data types this is going to be quite straightforward like for int, 
long etc. Nevertheless, extensive testing is going to be required for each of 
the data type which is integrated in the above process.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6949) Analysis of Derby Cost Estimation

2017-07-07 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078310#comment-16078310
 ] 

Harshvardhan Gupta commented on DERBY-6949:
---

As part of my work on GSoC 2017, this issue will track work on the analysis and 
proposed improvements to cost analysis in Derby Query Optimizer.

> Analysis of Derby Cost Estimation
> -
>
> Key: DERBY-6949
> URL: https://issues.apache.org/jira/browse/DERBY-6949
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DERBY-6949) Analysis of Derby Cost Estimation

2017-07-07 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6949:
-

 Summary: Analysis of Derby Cost Estimation
 Key: DERBY-6949
 URL: https://issues.apache.org/jira/browse/DERBY-6949
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Assignee: Harshvardhan Gupta






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071413#comment-16071413
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

Yes, the cardinality and cost estimates are both going to be more accurate, 
typically in case of loop joins cost is estimated as (number of rows in outer 
tables * cost for single row match in inner table).

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071405#comment-16071405
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

Let me emphasize on this improvement in case of multiple joins: Although in the 
above example my changes didn't change the cost (it shouldn't since during 
execution by that conglomerate needs to do the same work), improvement in 
cardinality estimates will help us avoid cascading errors during cost 
estimation of joins.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071404#comment-16071404
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

Bryan, You are on point!

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071383#comment-16071383
 ] 

Harshvardhan Gupta edited comment on DERBY-6942 at 7/1/17 8:02 PM:
---

I was able to solve my problem after spending some more time on it. Attached 
are two files, one is the patch which uses additional statistics for null count 
values to generate accurate selectivity estimates. It build upon DERBY-6940 and 
makes changes to FromBaseTable.java and TableDescriptor.java to consume the 
additional statistics.

The other file is the optimizer trace after application of the patch. The extra 
qualifier selectivity is changed to 0.5 representing the fraction of null 
values in my dataset during last statistics update.

I also tested with multi column indexes and I was able to successfully obtain 
accurate selectivity estimates for queries of this class. I encourage you to 
download the patch and test the behaviour. Some more work is required to 
integrate minimum and maximum column values which were also collected as part 
of additional statistics.



was (Author: harshvardhan145):
I was able to solve my problem after spending some more time on it. Attached 
are two files, one is the patch which uses additional statistics for null count 
values to generate accurate selectivity estimates. It build upon DERBY-6940 and 
makes changes to FromBaseTable.java and TableDescriptor.java to consume the 
additional statistics.

The other file is the optimizer trace after application of the patch. The extra 
qualifier selectivity is changed to 0.5 representing the fraction of null 
values on my dataset during last statistics update.

I also tested with multi column indexes and I was able to successfully obtain 
accurate selectivity estimates for queries if this class. I encourage you to 
download the patch and test the behaviour. Some more work is required to 
integrate minimum and maximum column values which were also collected as part 
of additional statistics.


> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6942:
--
Attachment: DERBY-6942.diff
z13.txt

I was able to solve my problem after spending some more time on it. Attached 
are two files, one is the patch which uses additional statistics for null count 
values to generate accurate selectivity estimates. It build upon DERBY-6940 and 
makes changes to FromBaseTable.java and TableDescriptor.java to consume the 
additional statistics.

The other file is the optimizer trace after application of the patch. The extra 
qualifier selectivity is changed to 0.5 representing the fraction of null 
values on my dataset during last statistics update.

I also tested with multi column indexes and I was able to successfully obtain 
accurate selectivity estimates for queries if this class. I encourage you to 
download the patch and test the behaviour. Some more work is required to 
integrate minimum and maximum column values which were also collected as part 
of additional statistics.


> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6942.diff, z12.txt, z13.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071337#comment-16071337
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

In code, I am trying to query the statistics of those column which are backed 
by indexes. (and thus their statistics exist). 

I'm struck on this problem: How do we take a predicate and find out the 
relevant conglomerate from the list of all conglomerates associated with that 
table. From a predicate we can find out the ColumnReference which holds the 
Column Name, so I am trying to use this info to find out the relevant info. I 
would appreciate help on this blocker.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: z12.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-07-01 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6942:
--
Attachment: z12.txt

Please find attached the optimizer trace of a specimen query.

Table DDL -

create table abcd(id integer);
create index idx on abcd(id);

Query issued - 

select * from ABCD where id is NULL

As the optimizer trace shows - There are two possible access paths - one via 
the table scan and one via the conglomerate backing the index. Each of the 
access paths can be queried via two lookup (or join strategies in case of 
joins), in our case a loop or a hash lookup.

During the index scan, the predicate (ID is NULL) acts as a valid start and 
stop predicate and store makes accurate representation regarding cardinality 
estimates from index lookup. In case of table scan, the predicate acts as a 
qualifier (a predicate that will restrict or filter the result set obtained via 
scan but won't help to locate the position of starting or ending point as 
opposed to index lookups). As evident in the trace output, a hard wired 
selectivity of 0.1 is being used here in this case of table scan for 
cardinality estimate. We have a opportunity to use statistics in such cases to 
remove our dependancy on hardwired estimates.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: z12.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-06-28 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067124#comment-16067124
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

In particular, when estimating cost for a particular conglomerate, it would be 
good if we start looking at other conglomerates' statistics for cardinality 
estimates. For example during table scan, we can start adjusting cardinality 
estimates using the statistics for index.

Note that cost and cardinality are two different concepts, cost will always be 
determined by the number and size of pages / rows that a particular access path 
has to read to execute its work. Table scan's cost will remain unaffected here 
in this case by changing thecardinality estimates. (However, index scan's cost 
and cardinality estimates will be co-related in our particular example)

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-06-28 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067117#comment-16067117
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

The example I discussed above was quite a simple example, when there are 
multiple joins, the cardinality estimates become important. Since the result of 
a particular table node act as outer table for the nested join, errors will 
propagate outwards and small errors will become significant. (cost of scans are 
determined by multiplication from the total number of rows of outer table if 
table locks are being used)

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-06-28 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067113#comment-16067113
 ] 

Harshvardhan Gupta commented on DERBY-6942:
---

The estimateCost function in FromBaseTable.java is the guts of the cost and 
cardinality estimation logic for base table predicates.

The function is called for each of conglomerate (i.e each of the access path 
possible on the table such as indexes, full table scan). For predicates that 
can be utilized as start or stop predicates for the given conglomerate (refer 
http://db.apache.org/derby/docs/10.13/tuning/ctunoptimz24840.html), row 
estimates are obtained through the store by giving the actual constant keys 
specified in the query, the estimate by row is quite accurate as it finds out 
the fraction of values between the start and stop keys.

For all other remaining predicates one of the two things happen:

1) If an equality op is specified and statistics exist for the conglomerate 
(and it is not a valid start/stop predicate), we query statistics for 
selectivity.(it takes into account number of unique columns)

2) Hard wired selectivities are used for all other cases.


As part of DERBY-6940, we started collecting extra statistics such as null 
count, min and max value for each of the columns of the index. We would like to 
utilise them to eliminate usage of hard wired selectivities whenever possible.

For example - 

Let us say we have a table and an associated index created using the following 
statement - 

create table abcd(id integer);
create index idx on abcd(id);

Now, suppose a query is issued - 

select * from abcd where id is NULL;

There are two possible access paths for the query and both are considered by 
the query optimizer - 

1) Index Scan - The store is able to give away fairly accurate row estimates as 
'NULL' is a valid start as well as stop key.

2) Table scan - Here, hard wired selectivity estimates are used for row 
estimates as discussed above.

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-26 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063622#comment-16063622
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Since I am performing deep copy of the objects, I don't think (min==max) will 
work in our case. So I had to use the equality function that is implemented by 
each data type, the problem is that I need to catch the StandardException 
thrown by those functions and I can't delegate it to the writeExternal method 
since the interface definition won't let it compile.

I too believe that the code in present form is little clunky and open for 
potential improvements.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, 
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-06-23 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6942:
-

 Summary: Utilise additional statistics for selectivity estimates.
 Key: DERBY-6942
 URL: https://issues.apache.org/jira/browse/DERBY-6942
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DERBY-6942) Utilise additional statistics for selectivity estimates.

2017-06-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6942:
-

Assignee: Harshvardhan Gupta

> Utilise additional statistics for selectivity estimates.
> 
>
> Key: DERBY-6942
> URL: https://issues.apache.org/jira/browse/DERBY-6942
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-23 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061215#comment-16061215
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Also, I am creating another issue which will track the progress of integration 
of the new statistics into selectivity estimates. Meanwhile, I would like to 
invite your thoughts on other statistics we should consider. Most common values 
and distribution buckets are on the top of my mind right now, I'd appreciate 
your thoughts on the same. The average size of row written to disk is important 
for cost estimation and a quick examination of derby optimizer's code reveals 
that derby is already using it in some form, nevertheless I would revisit it 
during my analysis of cost estimation over the course of this project. 

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, 
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-23 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061206#comment-16061206
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Hi Bryan,

I thought of a workaround and was successful. In particular, I am comparing the 
maxVal and minVal and if both are equal I first write an indicator boolean and 
then write only one DataValueDescriptor object. In all other cases, I first 
write maxVal and then minVal, In this way the problematic object will always be 
written last once.

public void writeExternal(ObjectOutput out)
 throws IOException
{
FormatableHashtable fh = new FormatableHashtable();
fh.putLong("numRows", numRows);
fh.putLong("numUnique", numUnique);
fh.putLong("nullCount", nullCount);
out.writeObject(fh);
try{
if (maxVal.equals(maxVal, minVal).getBoolean()) {
out.writeBoolean(true);
out.writeObject(minVal);
return;
}
}
catch(StandardException e){

}
finally {
out.writeBoolean(false);
out.writeObject(maxVal);
out.writeObject(minVal);
}
}



public void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException
{
FormatableHashtable fh = (FormatableHashtable)in.readObject();
numRows = fh.getLong("numRows");
numUnique = fh.getLong("numUnique");
nullCount = fh.getLong("nullCount");
if(in.readBoolean()){
maxVal = (DataValueDescriptor)in.readObject();
minVal = maxVal.cloneValue(true);
}
else{
maxVal = (DataValueDescriptor) in.readObject();
minVal = (DataValueDescriptor) in.readObject();
}
}

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, 
> EOFException_derby.log, EOFException.txt
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-22 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060091#comment-16060091
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

While debugging the failed test cases I seem to have reproduced a previously 
know issue. Please take a look at my comment on DERBY-3219.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-3219) Group by query with many aggregate columns and case statements fails with: ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted to read more da

2017-06-22 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060086#comment-16060086
 ] 

Harshvardhan Gupta commented on DERBY-3219:
---

While examining the failed test cases as part of patch for DERBY-6940,I get 
this error because of exactly the same root cause. i.e  value I am trying to 
store is a SQLVarChar whose value is "" (that is, a string of length 0).
]
To reproduce the error with the patch attached with DERBY-6940, run 

ant 
-Dderby.junit.testclass=org.apache.derbyTesting.functionTests.tests.lang.OrderByAndSortAvoidance
 junit-single

Since I am trying to filter the min and max values from the database, a string 
of length 0 turns out to be min value for the test cases in 
OrderByAndSortAvoidance.

> Group by query with many aggregate columns and case statements fails with: 
> ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted 
> to read more data than was originally stored
> -
>
> Key: DERBY-3219
> URL: https://issues.apache.org/jira/browse/DERBY-3219
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.3.1.4
>Reporter: Stan Bradbury
>Assignee: Bryan Pendleton
> Fix For: 10.5.1.1
>
> Attachments: maxminPatch.diff, patchWithTest.diff, pivotView.zip, 
> repro.java, sanityTest.diff, ten_four_patch.diff, testWithMemControls.diff
>
>
> using the attached database (v10.3) - " select * from pivotview " fails with 
> the stack trace below.  A view (pivotview_ok) created on a subset of the 
> columns in pivotview executes fine.  Adding one column back into pivotview_ok 
> causes failures most of the time.  See attached for view definitions.
> 2007-11-21 00:58:49.421 GMT Thread[main,5,main] (XID = 2734422), (SESSIONID = 
> 0), (DATABASE = pivotview), (DRDAID = null), Failed Statement is: select * 
> from pivotview
> ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted 
> to read more data than was originally stored
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.StreamFileContainer.fetchNext(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.StreamFileContainerHandle.fetchNext(Unknown
>  Source)
>   at org.apache.derby.impl.store.access.sort.MergeScan.mergeARow(Unknown 
> Source)
>   at org.apache.derby.impl.store.access.sort.MergeScan.init(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.access.sort.MergeSort.openSortScan(Unknown Source)
>   at 
> org.apache.derby.impl.store.access.RAMTransaction.openSortScan(Unknown Source)
>   at 
> org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
>   at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>   at org.apache.derby.impl.tools.ij.ij.executeImmediate(Unknown Source)
>   at org.apache.derby.impl.tools.ij.utilMain.doCatch(Unknown Source)
>   at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(Unknown Source)
>   at org.apache.derby.impl.tools.ij.utilMain.go(Unknown Source)
>   at org.apache.derby.impl.tools.ij.Main.go(Unknown Source)
>   at org.apache.derby.impl.tools.ij.Main.mainCore(Unknown Source)
>   at org.apache.derby.impl.tools.ij.Main14.main(Unknown Source)
>   at org.apache.derby.tools.ij.main(Unknown Source)
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readBoolean(DataInputStream.java:248)
>   at 
> org.apache.derby.impl.sql.execute.MaxMinAggregator.readExternal(Unknown 
> Source)
>   at 
> org.apache.derby.iapi.services.io.FormatIdInputStream.readObject(Unknown 
> Source)
>   at org.apache.derby.iapi.types.UserType.readExternal(Unknown Source)
>   ... 22 more
> = begin nested exception, level (1) ===
> java.io.EOFException
>   at java.io.DataInputStream.readBoolean(DataInputStream.java:248)
>   at 
> org.apache.derby.impl.sql.execute.MaxMinAggregator.readExternal(Unknown 
> Source)
>   at 
> 

[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-21 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--
Description: 
Derby should collect extra statistics during index build time, statistics 
refresh time which will help optimizer make more precise selectivity estimates 
and chose better execution paths.

We eventually want to utilize the new statistics to make better selectivity 
estimates / cost estimates that will help find the best query plan. Currently 
Derby keeps two type of stats - the total row count and the number of unique 
values.

We are initially extending the stats to include null count, the minimum value 
and maximum value associated with each of the columns of an index. This would 
be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
>= ] , all of which currently rely on hardwired selectivity estimates.

  was:Derby should collect extra statistics during index build time, statistics 
refresh time which will help optimizer make more precise selectivity estimates 
and chose better execution paths.


> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.
> We eventually want to utilize the new statistics to make better selectivity 
> estimates / cost estimates that will help find the best query plan. Currently 
> Derby keeps two type of stats - the total row count and the number of unique 
> values.
> We are initially extending the stats to include null count, the minimum value 
> and maximum value associated with each of the columns of an index. This would 
> be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, 
> >= ] , all of which currently rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-21 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058041#comment-16058041
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Another related link I missed earlier - 
https://www.postgresql.org/docs/9.1/static/planner-stats-details.html

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-21 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--
Attachment: DERBY-6940_3.diff

Bryan,

Thanks for the help. I was able to solve my problem by following your 
direction. Attached is the new patch. 

Regarding the derby statistics, postgres statistics information can be found 
out at - 

https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Using_ANALYZE_to_optimize_PostgreSQL_queries

https://www.postgresql.org/docs/9.0/static/catalog-pg-statistic.html

Postgres uses null count and distribution buckets, we are currently trying to 
do a primitive version of the distribution buckets assuming uniform 
distribution between our min values and max values but we can certainly 
incorporate that behaviour.

Another interesting thing is the average size of the row written to disk. This 
stat could be helpful in cost estimation, I don't see it's relevance in 
cardinality estimation though.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-20 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056086#comment-16056086
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

I am looking at object serialization as a possible solution but would like to 
be sure whether it makes sense to store the entire object or just its string 
representation.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-20 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056079#comment-16056079
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Another thing, I am able to write and read the minVal and maxVal as strings 
successfully using their toString methods, but when trying to store the 
original DataValueDescriptor object, I get the following error while trying to 
read the statistics -

ERROR XSDA8: Exception during restore of a serializable or SQLData object of cla
ss
ERROR XJ001: Java exception: 'java.lang.ClassCastException: java.lang.String can
not be cast to org.apache.derby.iapi.types.DataValueDescriptor: java.io.StreamCo
rruptedException'.
ERROR XJ001: Java exception: 'java.lang.String cannot be cast to org.apache.derb
y.iapi.types.DataValueDescriptor: java.lang.ClassCastException'. 

Storing minVal and maxVal using String is not preferred for comparisons later 
on, I am trying to store the original object.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-20 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--
Attachment: DERBY-6940_2.diff

I am attaching an updated diff which now collects extra statistics whenever an 
index is created / altered. There are 3 ways for statistics update - 

1) Creating a new index.
2) Altering an Index.
3) Explicitly calling SYSCS_UPDATE_STATISTICS system procedure.

The current patch collects statistics in all the above cases. I am currently 
working on upgrade logic post which we can look at ways to utilise the extra 
statistics in selectivity estimates.

While a soft upgrade can be performed by turning off extra statistics 
collection and turning off reading and writing to store. However, we can go 
about multiple ways regarding the hard upgrade strategy.

Options - 
1) We update all the existing indexes and add extra statistics during upgrade 
time.
2) We wait for user to explicitly call UPDATE_STATISTICS / Alter index system 
procedure before collecting and writing to store extra statistics.

I would like to know what the community thinks is the best way, also please 
share if you have any other strategy in mind.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: DERBY-6940_2.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column

2017-06-19 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054502#comment-16054502
 ] 

Harshvardhan Gupta commented on DERBY-6905:
---

I can see that dblook_test and others have not been added to the tools 
package's  _Suite.java triggered by junit-all. I was under the impression that 
all tests were added, what is the alternative way to trigger all the available 
junit tests for future reference?

> Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
> ---
>
> Key: DERBY-6905
> URL: https://issues.apache.org/jira/browse/DERBY-6905
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: dblook_tests.diff
>
>
> DERBY-6542 changed IDENTITY columns to build upon the underlying
> SEQUENCE objects. SEQUENCE objects can have a CYCLE option
> set for them, and as of DERBY-6852, the CYCLE option controls the
> behavior of the IDENTITY column that uses that SEQUENCE.
> It would be nice if SYS.SYSCOLUMNS would report the CYCLE option
> that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column

2017-06-19 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054256#comment-16054256
 ] 

Harshvardhan Gupta commented on DERBY-6905:
---

Sorry I meant the junit-all target.

> Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
> ---
>
> Key: DERBY-6905
> URL: https://issues.apache.org/jira/browse/DERBY-6905
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: dblook_tests.diff
>
>
> DERBY-6542 changed IDENTITY columns to build upon the underlying
> SEQUENCE objects. SEQUENCE objects can have a CYCLE option
> set for them, and as of DERBY-6852, the CYCLE option controls the
> behavior of the IDENTITY column that uses that SEQUENCE.
> It would be nice if SYS.SYSCOLUMNS would report the CYCLE option
> that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column

2017-06-19 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054252#comment-16054252
 ] 

Harshvardhan Gupta commented on DERBY-6905:
---

Hi Bryan,

The patch looks good to me but I am trying to understand why the ant all target 
passes without this patch.

Vardhan

> Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
> ---
>
> Key: DERBY-6905
> URL: https://issues.apache.org/jira/browse/DERBY-6905
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: dblook_tests.diff
>
>
> DERBY-6542 changed IDENTITY columns to build upon the underlying
> SEQUENCE objects. SEQUENCE objects can have a CYCLE option
> set for them, and as of DERBY-6852, the CYCLE option controls the
> behavior of the IDENTITY column that uses that SEQUENCE.
> It would be nice if SYS.SYSCOLUMNS would report the CYCLE option
> that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053198#comment-16053198
 ] 

Harshvardhan Gupta edited comment on DERBY-6940 at 6/18/17 1:56 PM:


Attaching an initial patch which add the following statistics information - 

1) The minimum value of each column appearing in an index. 
2) The maximum value of each column appearing in an index.
3) The null counts for each column appearing in an index.

I am currently storing column level values for minValue and maxValue as 
DataValueDescriptor object. I am not entirely sure if this is the best way to 
store it for later use in comparison for determining selectivity estimates.

To invoke the procedure extraStatistics call 
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS().
Currently the extraStatistics procedure is not invoked by the create Index 
statement, we will need to write a variant for "Create Index" statements to 
collect the extra statistics during index creation time as well.




was (Author: harshvardhan145):
Attaching an initial patch which add the following statistics information - 

1) The minimum value of each column appearing in an index. 
2) The maximum value of each column appearing in an index.
3) The null counts for each column appearing in an index.

I am currently storing column level values as DataValueDescriptor object. I am 
not entirely sure if this is the best way to store it for later use in 
comparison for determining selectivity estimates.

To invoke the procedure extraStatistics call 
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS().
Currently the extraStatistics procedure is not invoked by the create Index 
statement, we will need to write a variant for "Create Index" statements to 
collect the extra statistics during index creation time as well.



> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053201#comment-16053201
 ] 

Harshvardhan Gupta edited comment on DERBY-6940 at 6/18/17 1:53 PM:


Additionally we need to work on upgrade procedures, while trying to access 
statistics info of an index created before this patch, Derby will throw a NPE 
due to absence of the new statistics in those indexes.


was (Author: harshvardhan145):
Additionally we need to work on upgrade procedures, while trying to access 
statistics info of an index created before this patch, Derby will throw a NPE 
due to absence of the new statistics. 

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053201#comment-16053201
 ] 

Harshvardhan Gupta commented on DERBY-6940:
---

Additionally we need to work on upgrade procedures, while trying to access 
statistics info of an index created before this patch, Derby will throw a NPE 
due to absence of the new statistics. 

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--
Attachment: derby-6940.diff

Attaching an initial patch which add the following statistics information - 

1) The minimum value of each column appearing in an index. 
2) The maximum value of each column appearing in an index.
3) The null counts for each column appearing in an index.

I am currently storing column level values as DataValueDescriptor object. I am 
not entirely sure if this is the best way to store it for later use in 
comparison for determining selectivity estimates.

To invoke the procedure extraStatistics call 
SYSCS_UTIL.SYSCS_UPDATE_STATISTICS().
Currently the extraStatistics procedure is not invoked by the create Index 
statement, we will need to write a variant for "Create Index" statements to 
collect the extra statistics during index creation time as well.



> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6940:
--
Description: Derby should collect extra statistics during index build time, 
statistics refresh time which will help optimizer make more precise selectivity 
estimates and chose better execution paths.

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>
> Derby should collect extra statistics during index build time, statistics 
> refresh time which will help optimizer make more precise selectivity 
> estimates and chose better execution paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053197#comment-16053197
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Tracking work on statistics enhancement at 
https://issues.apache.org/jira/browse/DERBY-6940

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6940:
-

 Summary: Enhance derby statistics for more accurate selectivity 
estimates.
 Key: DERBY-6940
 URL: https://issues.apache.org/jira/browse/DERBY-6940
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.

2017-06-18 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6940:
-

Assignee: Harshvardhan Gupta

> Enhance derby statistics for more accurate selectivity estimates.
> -
>
> Key: DERBY-6940
> URL: https://issues.apache.org/jira/browse/DERBY-6940
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-14 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048798#comment-16048798
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

The specific approach I am thinking is to keep the minimum and maximum value of 
columns and number of NULL values in statistics, this could be utilised in 
operators such as (< , > , <=, >=, IS NOT NULL, NULL) etc.

For example, lets say we have a int column and the minimum and maximum value is 
20 and 100 respectively. Then for a query predicate on that column with the 
condition that >=80 should ideally return 25% of all columns. This approach 
obviously assumes an uniform distribution but should be good to get started 
with. We should be able to make it more efficient by taking into account 
distribution later on.

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-14 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048793#comment-16048793
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Bryan,
Regarding my doubt earlier, one thing that was particularly useful to dive deep 
into the optimizer was to enable optimizer tracing.
https://wiki.apache.org/db-derby/OptimizerTracing

The trace output is quite verbose and helps to understand the various choices 
the optimizer is making.

Few observations and scope of improvements that I would like to point out - 

1) Derby falls back to nested loops more often that we would like to 
particularly in case of large tables, currently the hash table resides entirely 
in memory and derby rules out the HASHJOIN approach if it suspects that it is 
going to be too large (default is 1048576)
Nested loops do not seem to be a good option specially when joining relatively 
large tables (similar to imdb dataset we are using) across more than 4 joins.

It is also documented in the optimizer paper that creating hash tables that 
spill to disk is a potential improvement and my experiments confirm that.

2) Another potential improvement with regards to cardinality estimates. Derby 
currently uses hard wired numbers for every operator other than the equality op 
for selectivity.
https://db.apache.org/derby/docs/10.0/manuals/tuning/perf56.html

In case of equality operator with a known value at compile time, it utilises 
statistics and make selectivity assumptions using number of unique values. I 
think we can enhance the statistics to be able to make better cardinality 
estimates. 


>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-14 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048780#comment-16048780
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

To view and compare the estimates row counts and true row counts for base 
tables and for intermediate results the following queries can be used on xplain 
tables, the value of OP_IDENTIFIER can be changed to get data for nodes for a 
particular operation such as HASHJOIN, NLJOIN etc  

select SEEN_ROWS, SEEN_ROWS_RIGHT, RETURNED_ROWS, EST_ROW_COUNT  from 
SYSXPLAIN_RESULTSETS,SYSXPLAIN_STATEMENTS where OP_IDENTIFIER = 'HASHJOIN' and 
SYSXPLAIN_STATEMENTS.STMT_ID = SYSXPLAIN_RESULTSETS.STMT_ID;

To couple the scan information for nodes involved in scans - 

select SEEN_ROWS, RETURNED_ROWS, EST_ROW_COUNT , OP_IDENTIFIER from 
SYSXPLAIN_STATEMENTS, SYSXPLAIN_RESULTSETS, SYSXPLAIN_SCAN_PROPS where 
SYSXPLAIN_STATEMENTS.STMT_ID = SYSXPLAIN_RESULTSETS.STMT_ID and 
SYSXPLAIN_RESULTSETS.SCAN_RS_ID = SYSXPLAIN_SCAN_PROPS.SCAN_RS_ID and 
OP_IDENTIFIER like '%SCAN';






>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DERBY-6936) Documentation for changes made as part of DERBY-6904

2017-06-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046060#comment-16046060
 ] 

Harshvardhan Gupta commented on DERBY-6936:
---

The documentation changes seems to cover all the issues we worked on. Looks 
great to me !


> Documentation for changes made as part of DERBY-6904
> 
>
> Key: DERBY-6936
> URL: https://issues.apache.org/jira/browse/DERBY-6936
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Bryan Pendleton
>Priority: Minor
> Attachments: docs.diff, rrefsistabs22441.html, rrefsqlj37836.html, 
> rrefsqlj81859.html
>
>
> Here are the changes I believe we need for this task:
> 1) For DERBY-6904, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsqlj81859.html to document the 
> syntax and behavior of using ALTER TABLE to change the CYCLE option for a 
> generated column.
> 2) For DERBY-6905, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsistabs22441.html to document 
> the new AUTOINCREMENTCYCLE column that appears in SYSCOLUMNS
> 3) For DERBY-6906, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsqlj37836.html to document 
> that a generated column spec can now explicitly state NO CYCLE
> I believe these are all the documentation changes that are necessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DERBY-6935) Test Coverage for added features.

2017-06-11 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta resolved DERBY-6935.
---
Resolution: Fixed

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046056#comment-16046056
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Yes, the latest commit looks good. I am resolving the issue.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046055#comment-16046055
 ] 

Harshvardhan Gupta commented on DERBY-6938:
---

Hi Bryan,

I am trying to understand the cardinality estimates Derby Optimizer makes in 
the presence of joins, I am currently doing more experiments starting from 
joins for 2 tables and then moving to more levels of joins similar to those 
present in job query dataset. Along with this I am examining the code and will 
share the information that I obtain here.

Regards,
Vardhan,

>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-10 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045702#comment-16045702
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Bryan, Yes the 'moreCycleTests.diff' contains the missing tests, I had not 
synced from remote before adding more tests to IdentitySequenceTests.java, 
maybe that could have caused problems with the usual patch command.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: moreCycleTests.diff, simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-10 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6938:
--
Attachment: explain.txt

Hi Bryan,

I am attaching the explain plan of one of the queries from the dataset. I have 
doubts with regard to the scans performed for INFO_TYPE and TITLE tables and 
optimizer's estimated row count for those scans.

How in general is the estimated row count for scans as part of joins 
determined? 

I went through the link below but they do not cover the case when a join is 
specified 
http://db.apache.org/derby/docs/10.13/tuning/ctunstats849203.html


>  Obtain cardinality estimates and true estimates for base tables as well as 
> for intermediate results for queries involving multiple joins. 
> ---
>
> Key: DERBY-6938
> URL: https://issues.apache.org/jira/browse/DERBY-6938
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
> Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-10 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045556#comment-16045556
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

No, the recent changes to IdentitySequenceTests.java do not seem to be included 
in 1797283.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6936) Documentation for changes made as part of DERBY-6904

2017-06-10 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045554#comment-16045554
 ] 

Harshvardhan Gupta commented on DERBY-6936:
---

Sure Bryan, please go ahead with the changes.

> Documentation for changes made as part of DERBY-6904
> 
>
> Key: DERBY-6936
> URL: https://issues.apache.org/jira/browse/DERBY-6936
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>
> Here are the changes I believe we need for this task:
> 1) For DERBY-6904, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsqlj81859.html to document the 
> syntax and behavior of using ALTER TABLE to change the CYCLE option for a 
> generated column.
> 2) For DERBY-6905, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsistabs22441.html to document 
> the new AUTOINCREMENTCYCLE column that appears in SYSCOLUMNS
> 3) For DERBY-6906, we need to update 
> https://db.apache.org/derby/docs/10.13/ref/rrefsqlj37836.html to document 
> that a generated column spec can now explicitly state NO CYCLE
> I believe these are all the documentation changes that are necessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-10 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045551#comment-16045551
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Hi Bryan,

I added some catalog tests too in IdentitySequenceTests.java which are not 
included in this commit but included in the patch I attached earlier.

Thanks

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Fix For: 10.14.0.0
>
> Attachments: simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-09 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044865#comment-16044865
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Bryan, I do think that we should go ahead and commit them.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6935) Test Coverage for added features.

2017-06-08 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6935:
--
Attachment: simpleTests_2.diff

Please find attached patch containing updated catalog tests and tests related 
to DERBY-6903.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: simpleTests_2.diff, simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-05 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6937:
--
Attachment: derby_script.sql
imdb.diff

Please find the attached files. 'derby_script.sql' contains the exact script 
used by me to set up the tables. The other files contain the changes in 
ImportReadData.java. 

There are 2 major changes in ImportReadData - 
1) Handling NULL values as discussed by me earlier.
2) Handling the escape characters.

Errors I saw related to 2) are discussed here as well - 
http://apache-database.10148.n7.nabble.com/Data-found-after-the-stop-delimiter-td100312.html.

I handled escape character in derby only, other solutions like pre-processing 
data externally exists as in case of handling NULL values.

'schema_derby.sql'  and  'schematext.sql' that came with dataset are mostly 
same other than the fields where the data type is just 'character varying' 
without specifying max length. For those columns, I have used 'CLOB' data type 
in derby as semantically equivalent to that of 'character varying' of undefined 
length in postgres.

You should be able to apply the imdb.diff patch and import the data without any 
problems now.

Thanks,
Vardhan





> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby_script.sql, imdb.diff, schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036043#comment-16036043
 ] 

Harshvardhan Gupta commented on DERBY-6937:
---

Hi Bryan,

I modified the derby code to check for NULL string during the file reading 
logic and ended up creating a variant of SYSCS_IMPORT_TABLE with minimal 
changes, yes there are a lot of ways to get around this problem, I found this 
approach quicker and cleaner in my current environment.

Specifically in ImportReadData.java, I checked the parsed column value in 
readNextDelimitedRow procedure and set it null if it matched the string "NULL".

I agree that there are known workarounds for this problem and it is not the 
focus of our project, let our documentation help someone in future trying to 
set up the environment for analyzing Derby's optimizer.

> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-06-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036035#comment-16036035
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Hey Bryan,

Yes the tests look good to me and capture the expected behaviour, I'll take 
your patch and add some more, particularly tests related to DERBY-6903 and 
DERBY-6905.

thanks,
vardhan

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: simpleTests.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-03 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6937:
--
Attachment: schema_derby.sql

I have completed work on this by ingesting the same data used in the original 
paper in my local derby env. I am also attaching the schema that is going to be 
used in derby (adapted for derby) to run the queries.

IMDB Data Snapshot used in the original VLDB paper -
http://homepages.cwi.nl/~boncz/job/imdb.tgz

The license and links to the current version IMDB data set can be
found at 
http://www.imdb.com/interfaces

While using SYSCS_IMPORT_TABLE to ingest the data, I got problem with handling 
NULL values, Derby expects null values to be blank in CSV, however when keyword 
"NULL" is present, it reads in character sequence as is for char data types and 
throws error in case of non character datatypes such as INTEGER, NUMERIC etc. I 
got around the problem using a quick hack.

Should we create a new variant of SYSCS_IMPORT_TABLE to take in an additional 
argument which will specify the string used to represent NULL in CSV dump? Or 
should we rather work on making existing procedures var-args rather than adding 
a new procedure.

I tried making the existing procedure as var-args but ran into same problem as 
discussed in this JIRA -
https://issues.apache.org/jira/browse/DERBY-4555




> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

2017-06-01 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6938:
-

 Summary:  Obtain cardinality estimates and true estimates for base 
tables as well as for intermediate results for queries involving multiple 
joins. 
 Key: DERBY-6938
 URL: https://issues.apache.org/jira/browse/DERBY-6938
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Assignee: Harshvardhan Gupta






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-01 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6937:
-

 Summary: Load the IMDB data set in Derby, obtain and adapt Join 
order Benchmark queries for use in derby 
 Key: DERBY-6937
 URL: https://issues.apache.org/jira/browse/DERBY-6937
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Assignee: Harshvardhan Gupta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-05-31 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031210#comment-16031210
 ] 

Harshvardhan Gupta commented on DERBY-6904:
---

Sure, I think it is good to be committed. I will be adding the test cases this 
week. 

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, 
> columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-05-29 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028473#comment-16028473
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 5/29/17 3:55 PM:


Thanks for reviewing the patch. Your comments are much appreciated.

As you rightly pointed out, cycling feature requires only boolean for storage 
and processing. The new changes are attached in 6904_4.diff which uses boolean 
data type. Also I have made changes to IdentitySequenceTest by changing them to 
reflect the new expected behaviour.

Regarding SYSSEQUNCES, yes it is mostly untouched, a new sequences is created 
when we change the cycling behaviour. (which also copies other autoincrement 
info such as increment value, current value etc. from existing sequence)

Other than two of the above test cases, we should add new test cases to 
completely cover all the functionality introduced with this patch. I am 
creating new subtasks for tracking test coverage and documentation.

For test coverage, let us discuss on the following subtask - 
https://issues.apache.org/jira/browse/DERBY-6935


was (Author: harshvardhan145):
Thanks for reviewing the patch. Your comments are much appreciated.

As you rightly pointed out, cycling feature requires only boolean for storage 
and processing. The new changes are attached in 6904_4.diff which uses boolean 
data type. Also I have made changes to IdentitySequenceTest by changing them to 
reflect the new expected behaviour.

Other than two of the above test cases, we should add new test cases to 
completely cover all the functionality introduced with this patch. I am 
creating new subtasks for tracking test coverage and documentation.

For test coverage, let us discuss on the following subtask - 
https://issues.apache.org/jira/browse/DERBY-6935

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, 
> columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-05-29 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028482#comment-16028482
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

Hey Bryan,

Since most of the functionality we are testing is closely knit together, I 
think it is better if we write tests independently rather than trying to divide 
the work and later merge them, how about I do the first iteration of all the 
tests and then you can perhaps add some more tricker tests or refine them as 
part of the second iteration if it's required. Or maybe vice versa as well if 
that suits.

Thanks

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6935) Test Coverage for added features.

2017-05-29 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028480#comment-16028480
 ] 

Harshvardhan Gupta commented on DERBY-6935:
---

The following functionality should be covered as part of writing tests for the 
patch - 

1) SYS.SYSCOLUMNS now reports the cycle value for the Identity column.
2) Alter table alter column command doesn't have any effect on the cycle 
column.(Two of the tests fixed as part of patch already cover this, I am sure 
we can add some more trickier test cases)
3) Explicitly turn off cycling using NO CYCLE command during create table or 
alter table queries.

> Test Coverage for added features.
> -
>
> Key: DERBY-6935
> URL: https://issues.apache.org/jira/browse/DERBY-6935
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-05-29 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028473#comment-16028473
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 5/29/17 3:40 PM:


Thanks for reviewing the patch. Your comments are much appreciated.

As you rightly pointed out, cycling feature requires only boolean for storage 
and processing. The new changes are attached in 6904_4.diff which uses boolean 
data type. Also I have made changes to IdentitySequenceTest by changing them to 
reflect the new expected behaviour.

Other than two of the above test cases, we should add new test cases to 
completely cover all the functionality introduced with this patch. I am 
creating new subtasks for tracking test coverage and documentation.

For test coverage, let us discuss on the following subtask - 
https://issues.apache.org/jira/browse/DERBY-6935


was (Author: harshvardhan145):
Thanks for reviewing the patch. Your comments are much appreciated.

As you rightly pointed out, cycling feature requires only boolean for storage 
and processing. The new changes are attached in 6904_4.diff which uses boolean 
data type. Also I have made changes to IdentitySequenceTest by changing them to 
reflect the new expected behaviour.

Other than two of the above test cases, we should add new test cases to 
completely cover all the functionality introduced with this patch. I am 
creating new subtasks for tracking test coverage and documentation.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, 
> columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DERBY-6936) Documentation for changes made as part of DERBY-6904

2017-05-29 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6936:
-

Assignee: Harshvardhan Gupta

> Documentation for changes made as part of DERBY-6904
> 
>
> Key: DERBY-6936
> URL: https://issues.apache.org/jira/browse/DERBY-6936
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DERBY-6936) Documentation for changes made as part of DERBY-6904

2017-05-29 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6936:
-

 Summary: Documentation for changes made as part of DERBY-6904
 Key: DERBY-6936
 URL: https://issues.apache.org/jira/browse/DERBY-6936
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DERBY-6935) Test Coverage for added features.

2017-05-29 Thread Harshvardhan Gupta (JIRA)
Harshvardhan Gupta created DERBY-6935:
-

 Summary: Test Coverage for added features.
 Key: DERBY-6935
 URL: https://issues.apache.org/jira/browse/DERBY-6935
 Project: Derby
  Issue Type: Sub-task
Reporter: Harshvardhan Gupta
Assignee: Harshvardhan Gupta
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-05-29 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6904:
--
Attachment: 6904_4.diff

Thanks for reviewing the patch. Your comments are much appreciated.

As you rightly pointed out, cycling feature requires only boolean for storage 
and processing. The new changes are attached in 6904_4.diff which uses boolean 
data type. Also I have made changes to IdentitySequenceTest by changing them to 
reflect the new expected behaviour.

Other than two of the above test cases, we should add new test cases to 
completely cover all the functionality introduced with this patch. I am 
creating new subtasks for tracking test coverage and documentation.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: 6904_2.diff, 6904_3.diff, 6904_4.diff, 6904.diff, 
> columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DERBY-6903) ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6903:
-

Assignee: Harshvardhan Gupta

> ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column
> ---
>
> Key: DERBY-6903
> URL: https://issues.apache.org/jira/browse/DERBY-6903
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
>
> Using ALTER TABLE with the SET INCREMENT BY or SET RESTART WITH
> options to make a change to a GENERATED ... BY IDENTITY column resets the 
> CYCLE option of that column to "off".
> Some test cases demonstrating this behavior were added as part of
> the DERBY-6852 change revision 1756287
> The problem occurs when the ALTER TABLE code drops and recreates
> the underlying SEQUENCE object, and does not preserve the CYCLE
> option of the underlying SEQUENCE object.
> I believe the correct behavior would be for the ALTER TABLE code
> to read the current CYCLE option of the underlying SEQUENCE object
> first, and then create the new SEQUENCE object with the same value
> for the CYCLE option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DERBY-6906) Allow a GENERATED column to explicitly specify NO CYCLE

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6906:
-

Assignee: Harshvardhan Gupta

> Allow a GENERATED column to explicitly specify NO CYCLE
> ---
>
> Key: DERBY-6906
> URL: https://issues.apache.org/jira/browse/DERBY-6906
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
>
> DERBY-6852 added the ability for a GENERATED IDENTITY column
> to specify that it should have CYCLE behavior, utilizing the underlying
> CYCLE behavior of the SEQUENCE object.
> A column which does not wish to have CYCLE behavior is specified
> by omitting the the CYCLE keyword.
> It would be nice to allow a GENERATED IDENTITY column to
> explicitly specify that it should have NO CYCLE behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6905:
-

Assignee: Harshvardhan Gupta

> Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
> ---
>
> Key: DERBY-6905
> URL: https://issues.apache.org/jira/browse/DERBY-6905
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
>
> DERBY-6542 changed IDENTITY columns to build upon the underlying
> SEQUENCE objects. SEQUENCE objects can have a CYCLE option
> set for them, and as of DERBY-6852, the CYCLE option controls the
> behavior of the IDENTITY column that uses that SEQUENCE.
> It would be nice if SYS.SYSCOLUMNS would report the CYCLE option
> that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6906) Allow a GENERATED column to explicitly specify NO CYCLE

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6906:
--
Issue Type: Sub-task  (was: Improvement)
Parent: DERBY-6904

> Allow a GENERATED column to explicitly specify NO CYCLE
> ---
>
> Key: DERBY-6906
> URL: https://issues.apache.org/jira/browse/DERBY-6906
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
>
> DERBY-6852 added the ability for a GENERATED IDENTITY column
> to specify that it should have CYCLE behavior, utilizing the underlying
> CYCLE behavior of the SEQUENCE object.
> A column which does not wish to have CYCLE behavior is specified
> by omitting the the CYCLE keyword.
> It would be nice to allow a GENERATED IDENTITY column to
> explicitly specify that it should have NO CYCLE behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6905) Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6905:
--
Issue Type: Sub-task  (was: Improvement)
Parent: DERBY-6904

> Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY column
> ---
>
> Key: DERBY-6905
> URL: https://issues.apache.org/jira/browse/DERBY-6905
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
>
> DERBY-6542 changed IDENTITY columns to build upon the underlying
> SEQUENCE objects. SEQUENCE objects can have a CYCLE option
> set for them, and as of DERBY-6852, the CYCLE option controls the
> behavior of the IDENTITY column that uses that SEQUENCE.
> It would be nice if SYS.SYSCOLUMNS would report the CYCLE option
> that is in effect for a IDENTITY column in that column's row in SYSCOLUMNS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6903) ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6903:
--
Issue Type: Sub-task  (was: Bug)
Parent: DERBY-6904

> ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column
> ---
>
> Key: DERBY-6903
> URL: https://issues.apache.org/jira/browse/DERBY-6903
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
>
> Using ALTER TABLE with the SET INCREMENT BY or SET RESTART WITH
> options to make a change to a GENERATED ... BY IDENTITY column resets the 
> CYCLE option of that column to "off".
> Some test cases demonstrating this behavior were added as part of
> the DERBY-6852 change revision 1756287
> The problem occurs when the ALTER TABLE code drops and recreates
> the underlying SEQUENCE object, and does not preserve the CYCLE
> option of the underlying SEQUENCE object.
> I believe the correct behavior would be for the ALTER TABLE code
> to read the current CYCLE option of the underlying SEQUENCE object
> first, and then create the new SEQUENCE object with the same value
> for the CYCLE option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-05-23 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6904:
--
Attachment: 6904_3.diff

I have completed work on the hard upgrade strategy with the attached patch 
6904_3.diff. All the test suites now pass with soft upgrades and hard upgrades. 
I also manually tested the expected behaviour using ij.

I now have to add test cases for the newly added functionality. The patch 
resolves the following related issues : DERBY-6903, DERBY-6904, DERBY-6905 and 
DERBY-6906


> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: 6904_2.diff, 6904_3.diff, 6904.diff, columnMismatch, 
> compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DERBY-6921) How good is the Derby Query Optimizer, really

2017-05-22 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta reassigned DERBY-6921:
-

Assignee: Harshvardhan Gupta

> How good is the Derby Query Optimizer, really
> -
>
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bryan Pendleton
>Assignee: Harshvardhan Gupta
>Priority: Minor
>  Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>  For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>for Derby to improve the query plans that it chooses for
>various classes of queries (this is explained in detail in the
>VLDB paper and other information available at Dr. Leis's site)
>For each such improvement, we need to isolate the issue,
>report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>with DBMS systems, and with SQL, and in particular with
>cost-based query optimization. If terms such as "cardinality
>estimation", "correlated query predicates", or "bushy trees"
>aren't comfortable terms for you ,this probably isn't the
>project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>of the extensive body of introductory material on Derby to
>become familiar with it: read the Derby Getting Started manual,
>download the software and follow the tutorials, read the documentation,
>download the source code and learn how to build and run the
>test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>need to read the paper(s), study the benchmark queries, and
>propose a detailed plan for how to use this benchmark as a tool
>for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

2017-03-27 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943436#comment-15943436
 ] 

Harshvardhan Gupta commented on DERBY-6921:
---

Thanks for reviewing the proposal. I'll make sure to thoroughly go into the 
above mentioned resources and past GSoC proposals to further refine my proposal.

> How good is the Derby Query Optimizer, really
> -
>
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bryan Pendleton
>Priority: Minor
>  Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>  For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>for Derby to improve the query plans that it chooses for
>various classes of queries (this is explained in detail in the
>VLDB paper and other information available at Dr. Leis's site)
>For each such improvement, we need to isolate the issue,
>report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>with DBMS systems, and with SQL, and in particular with
>cost-based query optimization. If terms such as "cardinality
>estimation", "correlated query predicates", or "bushy trees"
>aren't comfortable terms for you ,this probably isn't the
>project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>of the extensive body of introductory material on Derby to
>become familiar with it: read the Derby Getting Started manual,
>download the software and follow the tutorials, read the documentation,
>download the source code and learn how to build and run the
>test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>need to read the paper(s), study the benchmark queries, and
>propose a detailed plan for how to use this benchmark as a tool
>for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

2017-03-25 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942086#comment-15942086
 ] 

Harshvardhan Gupta commented on DERBY-6921:
---

Hi Kathey, please find the draft proposal at 

https://docs.google.com/document/d/18UzZt7GGuYX_ww_o9kPwdZH0eftYIMRxfJQ3fcKe0ek/edit?usp=sharing

I'll make sure to update the Derby Wiki once I submit the final proposal.

> How good is the Derby Query Optimizer, really
> -
>
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bryan Pendleton
>Priority: Minor
>  Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>  For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>for Derby to improve the query plans that it chooses for
>various classes of queries (this is explained in detail in the
>VLDB paper and other information available at Dr. Leis's site)
>For each such improvement, we need to isolate the issue,
>report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>with DBMS systems, and with SQL, and in particular with
>cost-based query optimization. If terms such as "cardinality
>estimation", "correlated query predicates", or "bushy trees"
>aren't comfortable terms for you ,this probably isn't the
>project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>of the extensive body of introductory material on Derby to
>become familiar with it: read the Derby Getting Started manual,
>download the software and follow the tutorials, read the documentation,
>download the source code and learn how to build and run the
>test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>need to read the paper(s), study the benchmark queries, and
>propose a detailed plan for how to use this benchmark as a tool
>for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

2017-03-25 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942035#comment-15942035
 ] 

Harshvardhan Gupta commented on DERBY-6921:
---

Hi Derby community,

I have submitted a draft proposal for this project through google. Please 
review it and help me further refine it. Please comment if you are not able to 
access it through google.

Regards,
Harshvardhan Gupta

> How good is the Derby Query Optimizer, really
> -
>
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bryan Pendleton
>Priority: Minor
>  Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>  For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>for Derby to improve the query plans that it chooses for
>various classes of queries (this is explained in detail in the
>VLDB paper and other information available at Dr. Leis's site)
>For each such improvement, we need to isolate the issue,
>report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>with DBMS systems, and with SQL, and in particular with
>cost-based query optimization. If terms such as "cardinality
>estimation", "correlated query predicates", or "bushy trees"
>aren't comfortable terms for you ,this probably isn't the
>project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>of the extensive body of introductory material on Derby to
>become familiar with it: read the Derby Getting Started manual,
>download the software and follow the tutorials, read the documentation,
>download the source code and learn how to build and run the
>test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>need to read the paper(s), study the benchmark queries, and
>propose a detailed plan for how to use this benchmark as a tool
>for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-13 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907861#comment-15907861
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 3/13/17 5:05 PM:


Thanks Bryan, 

Soft Upgrade Strategy - 
In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all 
columns except the newly introduced one while running a database in soft 
upgrade mode. (i.e both forward and backward compatible). The code flow was 
modelled after Derby-534 for introducing a new in column SYSTRIGGERS..

Since most test suites used only soft upgrade mode, I passed all the test 
suites except the upgradeTests package which explicitly boots the old databases 
in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard 
upgrade strategy.

Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS 
as part of explicit hard upgrade?
 
The updated patch passed all the test suites other than upgradeTests test suite 
where some tests require a hard update strategy. I also tested soft updates 
using IJ for database created using 10.13, the created databases are both 
backward and forward compatible with 10.14.

Additionally all the update logic is contained in DD_VERSION file. Please let 
me know if the current soft strategy I am using is not a desired one.


was (Author: harshvardhan145):
Thanks Bryan, 

Soft Upgrade Strategy - 
In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all 
columns except the newly introduced one while running a database in soft 
upgrade mode. (i.e both forward and backward compatible). The code flow was 
modelled after Derby-534 for introducing a new in column SYSTRIGGERS..

Since most test suites used only soft upgrade mode, I passed all the test 
suites except the upgradeTests package which explicitly boots the old databases 
in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard 
upgrade strategy.

Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS 
as part of explicit hard upgrade?
 
The updated patch passed all the test suites other than upgradeTests test suite 
where some tests require a hard update strategy. I also tested soft updates 
using IJ for database created using 10.13, the created databases are both 
backward and forward compatible with 10.14.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904_2.diff, 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-13 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6904:
--
Attachment: 6904_2.diff

Thanks Bryan, 

Soft Upgrade Strategy - 
In the attached updated patch, I tweaked the SYSColumnsRowFactory to serve all 
columns except the newly introduced one while running a database in soft 
upgrade mode. (i.e both forward and backward compatible). The code flow was 
modelled after Derby-534 for introducing a new in column SYSTRIGGERS..

Since most test suites used only soft upgrade mode, I passed all the test 
suites except the upgradeTests package which explicitly boots the old databases 
in hard upgrade mode other than soft upgrade mode. I am yet to implement a hard 
upgrade strategy.

Should we look at the SYSSEQUENCES column and populated the value in SYSCOLUMNS 
as part of explicit hard upgrade?
 
The updated patch passed all the test suites other than upgradeTests test suite 
where some tests require a hard update strategy. I also tested soft updates 
using IJ for database created using 10.13, the created databases are both 
backward and forward compatible with 10.14.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904_2.diff, 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-12 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906477#comment-15906477
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 3/12/17 10:14 AM:
-

Update on previous comment - 

I tested with various scenarios by creating database without and with the new 
changes using IJ. Whenever I try to access any of the databases created without 
my changes through the patched version I get this error -

"XSCH5: In a base table there was a mismatch between the requested column 
number 9 and the maximum number of columns 9.". Note that we increased the 
number of columns in SYSCOLUMNs from 9 to 10 in this patch.

Should we incorporate a update strategy when older databases are accessed by 
this patched version?


was (Author: harshvardhan145):
Update on previous comment - 

I tested with various scenarios by creating database without and with the new 
changes using IJ. Whenever I try to access any of the databases created without 
my changes through the patched version I get this error -

"In a base table there was a mismatch between the requested column number 9 and 
the maximum number of columns 9.". Note that we increased the number of columns 
in SYSCOLUMNs from 9 to 10 in this patch.

Should we incorporate a update strategy when older databases are accessed by 
this patched version?

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-12 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906477#comment-15906477
 ] 

Harshvardhan Gupta commented on DERBY-6904:
---

Update on previous comment - 

I tested with various scenarios by creating database without and with the new 
changes using IJ. Whenever I try to access any of the databases created without 
my changes through the patched version I get this error -

"In a base table there was a mismatch between the requested column number 9 and 
the maximum number of columns 9.". Note that we increased the number of columns 
in SYSCOLUMNs from 9 to 10 in this patch.

Should we incorporate a update strategy when older databases are accessed by 
this patched version?

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906294#comment-15906294
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 3/11/17 7:21 PM:


Some of the existing tests are failing with one of the following reasons - 

1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState
Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I 
think this error is mostly due to addition of a new column in SYSCOLUMNS which 
is not there in existing DB jars when they are brought to memory in test 
packages. (XSCH5 is a heap class SQLState)

2) compareDatabaseNames throws mismatch error while building a connection. I 
think these errors are a direct result of the above error when shutdown is not 
clean.

I am attaching both the error stacks for reference. Please take a look at the 
error stacks and give your inputs.  Also it would be helpful if you clarify the 
process of reading a jar/disk database and its relationship with the heap 
structures of derby.




was (Author: harshvardhan145):
Some of the existing tests are failing with one of the following reasons - 

1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState
Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I 
think this error is mostly due to addition of a new column in SYSCOLUMNS which 
is not there in existing DB jars when they are brought to memory in test 
packages. (HSCH5 is a heap class SQLState)

2) compareDatabaseNames throws mismatch error while building a connection. I 
think these errors are a direct result of the above error when shutdown is not 
clean.

I am attaching both the error stacks for reference. Please take a look at the 
error stacks and give your inputs.  Also it would be helpful if you clarify the 
process of reading a jar/disk database and its relationship with the heap 
structures of derby.



> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-11 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906294#comment-15906294
 ] 

Harshvardhan Gupta edited comment on DERBY-6904 at 3/11/17 7:20 PM:


Some of the existing tests are failing with one of the following reasons - 

1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState
Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I 
think this error is mostly due to addition of a new column in SYSCOLUMNS which 
is not there in existing DB jars when they are brought to memory in test 
packages. (HSCH5 is a heap class SQLState)

2) compareDatabaseNames throws mismatch error while building a connection. I 
think these errors are a direct result of the above error when shutdown is not 
clean.

I am attaching both the error stacks for reference. Please take a look at the 
error stacks and give your inputs.  Also it would be helpful if you clarify the 
process of reading a jar/disk database and its relationship with the heap 
structures of derby.




was (Author: harshvardhan145):
Some of the existing tests are failing with one of the following reasons - 

1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState
Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I 
think this error is mostly due to addition of a new column in SYSCOLUMNS which 
is not there in existing DB jars when they are brought to memory in test 
packages. (HSCH5 is a heap class SQLState)

2) compareDatabaseNames throws mismatch error while building a connection. I 
think these errors are a direct result of the above error when shutdown is not 
clean. 

I am attaching both the error stacks for reference. Please take a look at the 
error stacks and give your inputs.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-11 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6904:
--
Attachment: compareDbName
columnMismatch

Some of the existing tests are failing with one of the following reasons - 

1)Engine Shutdown: Expected XJ015 but was XSCH5 in method assertSQLState
Referring to https://db.apache.org/derby/docs/10.2/ref/rrefexcept71493.html , I 
think this error is mostly due to addition of a new column in SYSCOLUMNS which 
is not there in existing DB jars when they are brought to memory in test 
packages. (HSCH5 is a heap class SQLState)

2) compareDatabaseNames throws mismatch error while building a connection. I 
think these errors are a direct result of the above error when shutdown is not 
clean. 

I am attaching both the error stacks for reference. Please take a look at the 
error stacks and give your inputs.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff, columnMismatch, compareDbName
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-09 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902720#comment-15902720
 ] 

Harshvardhan Gupta commented on DERBY-6904:
---

Thanks Bryan for the inputs. I will be adding the test cases and also the patch 
for DERBY-6906 (Allow a GENERATED column to explicitly specify NO CYCLE) in a 
couple of days.

> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DERBY-6904) Enhance ALTER TABLE to allow CYCLE option to be changed.

2017-03-08 Thread Harshvardhan Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/DERBY-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshvardhan Gupta updated DERBY-6904:
--
Attachment: 6904.diff

The attached patch covers the following issues -

DERBY-6904 Enhance ALTER TABLE to allow CYCLE option to be changed.

DERBY-6905 Enhance SYS.SYSCOLUMNS to report the CYCLE value for an IDENTITY 
column

DERBY-6903 ALTER TABLE ALTER COLUMN resets CYCLE option of IDENTITY column

The idea was to add the CYCLE property as a fundamental property associated 
with a column similar to current increment value, current increment step etc.

- SYS.SYSCOLUMNS extended to hold the cycle value of identity column.
- Extended the grammar to accept "cycle","no cycle" in alter clause and changed 
the underlying column descriptor and sequencing objects.
- ALTER clause not involving cycle option (i.e increment and restart) won't 
affect the cycle option.

I have manually tested with different queries for all the above scenarios. I 
will be writing test cases over coming days and fixing older one which are 
exepected to break due to changes in SYSCOLUMNS view.

Also, while testing I encountered the following behaviour present in the latest 
derby release:

If increment step is changed, the current value of syssequences is not changed 
however current value in syscolumns is changed. Interestingly, this only affect 
the SYSCOLUMNS view and not the generation of sequences since SYSSEQUENCES is 
consistent.
I found DERBY-6579 which addressed the bug and made SYSSEQUENCES consistent. 
Please clarify whether the SYSCOLUMNS behaviour is a deliberate one or should 
be filed as a bug.


 




> Enhance ALTER TABLE to allow CYCLE option to be changed.
> 
>
> Key: DERBY-6904
> URL: https://issues.apache.org/jira/browse/DERBY-6904
> Project: Derby
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 10.13.1.0
>Reporter: Bryan Pendleton
>Priority: Minor
> Attachments: 6904.diff
>
>
> It would be nice to have a variant of the ALTER TABLE statement,
> similar to the SET INCREMENT BY or SET RESTART WITH variants,
> that allowed a user to change the CYCLE option on an IDENTITY column.
> I think that the code flow for this could be closely modelled on the
> code flow for the SET INCREMENT BY variant: after the statement
> is parsed, we simply need to drop and recreate the underlying SEQUENCE
> object with the desired CYCLE option (and preserving all the other
> aspects of the underlying SEQUENCE object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >