Paul Rogers created DRILL-4709:
----------------------------------
Summary: Document the included Foodmart sample data
Key: DRILL-4709
URL: https://issues.apache.org/jira/browse/DRILL-4709
Project: Apache Drill
Issue Type: Improvement
Components: Documentation
Affects Versions: 1.6.0
Reporter: Paul Rogers
Priority: Minor
Drill includes a JSON version of the Mondrian FoodMart sample data. This data
appears in the $DRILL_HOME/jars/3rdparty/foodmart-data-json-0.4.jar jar file,
accessible using the class path storage plugin.
The documentation mentions using the cp plugin to access customers.json.
However, the FoodMart data set is quite rich, with many example files.
As it is, unless someone is a curious developer, and good with Google, they
won't be able to find the other data sets or the source of the FoodMart data.
The data appears to be a JSON version of the SQL sample data for the Mondrian
project. A schema description is here:
https://github.com/pentaho/mondrian/blob/master/demo/FoodMart.xml
The Mondrian data appears to have originated at Microsoft to highlight their
circa 2000 OLAP projects, but has since been discontinued. See
* http://sqlmag.com/development/dts-2000-action
* https://technet.microsoft.com/en-us/library/aa217032(v=sql.80).aspx
* http://sqlmag.com/sql-server/desperately-seeking-samples
Or do a Google search for "microsoft foodmart database".
The request is to:
1. Credit MS and Mondrian for the data.
2. Either explain the data (which is quite a bit of work), or
3. Explain how to extract the files from the jar file to explore manually.
4. Provide a pointer to a description of the schema (if such can be found.)
For option 3:
cd $DRILL_HOME/jars/3rdparty
unzip foodmart-data-json-0.4.jar -d ~/foodmart
cd ~/foodmart
ls
Looking at the data, it is clear that SOME description is needed to understand
the many tables and how they might work with Drill.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)