[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-05 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038085#comment-16038085
 ] 

Bryan Pendleton commented on DERBY-6937:


Yay! Your patch worked great for me, as did your updated script, and I've 
successfully loaded the sample data into my Derby database. Thanks!


> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: derby_script.sql, imdb.diff, schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-04 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036425#comment-16036425
 ] 

Bryan Pendleton commented on DERBY-6937:


Could you perhaps attach (if you have it) your exact script of 
SYSCS_IMPORT_TABLE calls? I'd like to follow along with your experiments, and 
am hoping to keep my environment pretty close to yours...

Also, maybe you could attach the diff you made to ImportReadData.java, for 
future reference?

Lastly, I was looking at the differences between the 'schema_derby.sql' file 
that you attached, and the 'schematext.sql' that came with the IMDB sample 
data, and I was wondering if you had any thoughts to share about the 
differences between the two files? They seem close, but not identical, and I 
was wondering what you could share about the changes you had to make to get the 
data to import.


> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-03 Thread Harshvardhan Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036043#comment-16036043
 ] 

Harshvardhan Gupta commented on DERBY-6937:
---

Hi Bryan,

I modified the derby code to check for NULL string during the file reading 
logic and ended up creating a variant of SYSCS_IMPORT_TABLE with minimal 
changes, yes there are a lot of ways to get around this problem, I found this 
approach quicker and cleaner in my current environment.

Specifically in ImportReadData.java, I checked the parsed column value in 
readNextDelimitedRow procedure and set it null if it matched the string "NULL".

I agree that there are known workarounds for this problem and it is not the 
focus of our project, let our documentation help someone in future trying to 
set up the environment for analyzing Derby's optimizer.

> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby

2017-06-03 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036038#comment-16036038
 ] 

Bryan Pendleton commented on DERBY-6937:


Great progress!

What was your "quick hack" for the NULL problem in the data? Was it:
1) You imported the data literally as-is, then ran a SQL UPDATE statement to 
change "NULL" to a proper NULL value afterwards?
2) Or, you ran a Perl command or other tool over the data ahead-of-time to 
replace "NULL" with an empty string prior to the import?

Either way seems fine to me, I was just wondering since I will try to reproduce 
your findings.

I agree with you, the var-args problem that we encountered in DERBY-4555 was 
rather fundamental and hard to avoid.

If it's easy enough to work around this NULL placeholder in the input dataset, 
perhaps we can just disregard this minor inconvenience once we've documented 
our approach, since our primary goal is to study the query optimizer, not to 
work on the import/export tools?


> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark 
> queries for use in derby 
> 
>
> Key: DERBY-6937
> URL: https://issues.apache.org/jira/browse/DERBY-6937
> Project: Derby
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Harshvardhan Gupta
>Assignee: Harshvardhan Gupta
>Priority: Minor
> Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)