[
https://issues.apache.org/jira/browse/HIVE-18051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333965#comment-16333965
]
Zoltan Haindrich commented on HIVE-18051:
-----------------------------------------
I've gone thru the changes, so here are my comments :)
* {{test.src.tables}} has to be set to the the joined version of {{srcTables}}
to enable the table protector ; please add 2 negative tests to check that
dropping a dataset fails; if its not there; then the table "protector" will not
prevent changes to it (EnforceReadOnlyTables.java)
* I now think that it would be better to enable it for all the q tests; at
least parse the files all the time and look for the dataset pattern; it's too
defensive this way - there won't be any problems :)
* why is the sample dataset setting hive.stats.dbclass to fs? it was like
that? we should probably set it in hiveconf.java / hive-site.xml ; doesn't seem
to be something which belong to the dataset itself...
other minor notices:
* data/files/kv1.txt has the same content as data/files/testdataset.txt
* I feel that we shouldn't be loading all datasets upfront; just load what's
needed to run the actual test...that could enable to remove the dataset from
the cliconfig interface ; and move it to somewhere when the test is being
executed; in the current design it would only work for the {{CoreCliDriver}}
family tests - this probably come handy later; last week I've created a junit
rule which is able to run driver tests from ide; incorporating into that later
might make it more convinient
* 1 dataset = 1 table ; I think the Dataset interface should show the
contract; for now I don't think an interface is neccessary, because theres only
1 implementation
it's great that you are working on this; I'm looking forward to start using it!
> qfiles: dataset support
> -----------------------
>
> Key: HIVE-18051
> URL: https://issues.apache.org/jira/browse/HIVE-18051
> Project: Hive
> Issue Type: Improvement
> Components: Testing Infrastructure
> Reporter: Zoltan Haindrich
> Assignee: Laszlo Bodor
> Priority: Major
> Attachments: HIVE-18051.01.patch, HIVE-18051.02.patch,
> HIVE-18051.03.patch, HIVE-18051.04.patch, HIVE-18051.05.patch,
> HIVE-18051.06.patch
>
>
> it would be great to have some kind of test dataset support; currently there
> is the {{q_test_init.sql}} which is quite large; and I'm often override it
> with an invalid string; because I write independent qtests most of the time -
> and the load of {{src}} and other tables are just a waste of time for me ;
> not to mention that the loading of those tables may also trigger breakpoints
> - which is a bit annoying.
> Most of the tests are "only" using the {{src}} table and possibly 2 others;
> however the main init script contains a bunch of tables - meanwhile there are
> quite few other tests which could possibly also benefit from a more general
> feature; for example the creation of {{bucket_small}} is present in 20 q
> files.
> the proposal would be to enable the qfiles to be annotated with metadata like
> datasets:
> {code}
> --! qt:dataset:src,bucket_small
> {code}
> proposal for storing a dataset:
> * the loader script would be at: {{data/datasets/__NAME__/load.hive.sql}}
> * the table data could be stored under that location
> a draft about this; and other qfiles related ideas:
> https://docs.google.com/document/d/1KtcIx8ggL9LxDintFuJo8NQuvNWkmtvv_ekbWrTLNGc/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)