Paul Rogers created DRILL-6667:
----------------------------------
Summary: Include internal data sets in Documentation Sample
Datasets
Key: DRILL-6667
URL: https://issues.apache.org/jira/browse/DRILL-6667
Project: Apache Drill
Issue Type: Improvement
Components: Documentation
Affects Versions: 1.13.0
Reporter: Paul Rogers
Assignee: Bridget Bevens
The Drill documentation provides the "Sample Datasets" section, which is very
handy. However, this section does not discuss the two datasets provided with
Drill itself.
* Julian Hyde's [FoodMart data
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class
path.
* TPC-H data set.
The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill
sample query (see below) references a FoodMart table. To see the list of tables
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the
Maven dependencies for {{drill-java-exec}}. The table names here are simplified
relative to those in the ER diagram in the above link. Perhaps include a simple
table with names, and the mapping to the original names, and a link to (or just
embed the link) to the FoodMart ER image. The data is available in JSON format.
TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema
is described in the [TPC-H
specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp).
Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we
mention the Yelp data set. But, we don't mention that in the "Sample Datasets"
section. We should, just to be consistent and to save the reader time when
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let
me look in Sample Datasets. Wait.. no Yelp?"
These are very handy, but hard to find: I find I must keep searching the source
code to remember file names and directory paths. End uses won't have this
luxury.
Suggestion: Describe the files available in the class path data source.
Along these same lines, in "Connect a Data Source", there is no mention of the
class path data source. Yet, we reference that data source in the Web Console
where we suggest a sample query to run:
{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)