[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038085#comment-16038085 ] Bryan Pendleton commented on DERBY-6937: Yay! Your patch worked great for me, as did your updated script, and I've successfully loaded the sample data into my Derby database. Thanks! > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: derby_script.sql, imdb.diff, schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036425#comment-16036425 ] Bryan Pendleton commented on DERBY-6937: Could you perhaps attach (if you have it) your exact script of SYSCS_IMPORT_TABLE calls? I'd like to follow along with your experiments, and am hoping to keep my environment pretty close to yours... Also, maybe you could attach the diff you made to ImportReadData.java, for future reference? Lastly, I was looking at the differences between the 'schema_derby.sql' file that you attached, and the 'schematext.sql' that came with the IMDB sample data, and I was wondering if you had any thoughts to share about the differences between the two files? They seem close, but not identical, and I was wondering what you could share about the changes you had to make to get the data to import. > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036043#comment-16036043 ] Harshvardhan Gupta commented on DERBY-6937: --- Hi Bryan, I modified the derby code to check for NULL string during the file reading logic and ended up creating a variant of SYSCS_IMPORT_TABLE with minimal changes, yes there are a lot of ways to get around this problem, I found this approach quicker and cleaner in my current environment. Specifically in ImportReadData.java, I checked the parsed column value in readNextDelimitedRow procedure and set it null if it matched the string "NULL". I agree that there are known workarounds for this problem and it is not the focus of our project, let our documentation help someone in future trying to set up the environment for analyzing Derby's optimizer. > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
[ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036038#comment-16036038 ] Bryan Pendleton commented on DERBY-6937: Great progress! What was your "quick hack" for the NULL problem in the data? Was it: 1) You imported the data literally as-is, then ran a SQL UPDATE statement to change "NULL" to a proper NULL value afterwards? 2) Or, you ran a Perl command or other tool over the data ahead-of-time to replace "NULL" with an empty string prior to the import? Either way seems fine to me, I was just wondering since I will try to reproduce your findings. I agree with you, the var-args problem that we encountered in DERBY-4555 was rather fundamental and hard to avoid. If it's easy enough to work around this NULL placeholder in the input dataset, perhaps we can just disregard this minor inconvenience once we've documented our approach, since our primary goal is to study the query optimizer, not to work on the import/export tools? > Load the IMDB data set in Derby, obtain and adapt Join order Benchmark > queries for use in derby > > > Key: DERBY-6937 > URL: https://issues.apache.org/jira/browse/DERBY-6937 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: schema_derby.sql > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)