[
https://issues.apache.org/jira/browse/DRILL-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5204.
--------------------------------
Resolution: Fixed
Not sure why this was not closed earlier. Feature has been checked into Master.
Set up the mock data source. Then:
{code}
SELECT id_i, name_s50 FROM `mock`.`customers_1M`
{code}
The column and table names are fictions. The important part is the suffix. For
columns, "_i" means integer, "_sx" means a string of length x, and so on. For
tables, "x" means x rows. "xK" means x thousand rows. "xM" means x million rows.
See the {{ExampleTest}} class for details.
> Extend mock data source to use table specs from SQL
> ---------------------------------------------------
>
> Key: DRILL-5204
> URL: https://issues.apache.org/jira/browse/DRILL-5204
> Project: Apache Drill
> Issue Type: Improvement
> Components: Tools, Build & Test
> Affects Versions: 1.9.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
>
> DRILL-5152 provided a simple way to generate mock data from SQL:
> {code}
> SELECT colName_type FROM `mock`.`tableName_size` ...
> {code}
> The fix in that release encoded types and record counts directly in the SQL,
> which is very handy for many simple cases.
> The original mock data source has another feature: it lets you create
> multiple mock blocks of data that can be read in multiple threads. Later
> additions made it easy to repeat a column definition (to generate, say, a
> table with 1000 columns), to choose the data generator class, etc. All of
> this was available only when writing physical plans by hand and encoding the
> definition in the sub scan for the mock data source.
> This enhancement extends the SQL feature to allow the definitions to appear
> in a JSON file easily referenced from SQL. The JSON file must be somewhere on
> the class path (typically in a resources directory.) Then:
> {code}
> SELECT red, blue, green FROM `mock`.`foo/colors.json` ...
> {code}
> Is interpreted to mean, "the file colors.json defines a mock data source,
> perhaps with repeated columns, perhaps with multiple fragments. From that
> mock data source, select the three columns red, blue and green."
> With this change, tests can include quite sophisticated mock data sources,
> simplifying debugging of plans with multiple fragments and/or more complex
> table structures.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)