Paul Rogers created DRILL-5949: ---------------------------------- Summary: JSON format options should be part of plugin config; not session options Key: DRILL-5949 URL: https://issues.apache.org/jira/browse/DRILL-5949 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.12.0 Reporter: Paul Rogers
Drill provides a JSON record reader. Drill provides two ways to configure this reader: * Using the JSON plugin configuration. * Using a set of session options. The plugin configuration defines the file suffix associated with JSON files. The session options are: * {{store.json.all_text_mode}} * {{store.json.read_numbers_as_double}} * {{store.json.reader.skip_invalid_records}} * {{store.json.reader.print_skipped_invalid_record_number}} Suppose I have to JSON files from different sources (and keep them in distinct directories.) For the one, I want to use {{all_text_mode}} off as the data is nicely formatted. Also, my numbers are fine, so I want {{read_numbers_as_double}} off. But, the other file is a mess and uses a rather ad-hoc format. So, I want these two options turned on. As it turns out I often query both files. Today, I must set the session options one way to query my "clean" file, then reverse them to query the "dirty" file. Next, I want to join the two files. How do I set the options one way for the "clean" file, and the other for the "dirty" file within the *same query*? Can't. Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It has a variety of options. But, the are *not* session options; they are instead options in the plugin definition. This allows me to, say, have a plugin config for CSV-with-headers files that I get from source A, and a different plugin config for my CSV-without-headers files from source B. Suppose we applied the text reader technique to the JSON reader. We'd move the session options listed above into the JSON format plugin. Then, I can define one plugin for my "clean" files, and a different plugin config for my "dirty" files. What's more, I can then use table functions to adjust the format for each file as needed within a single query. Since table functions are part of a query, I can add them to a view that I define for the various JSON files. The result is a far simpler user experience than the tedium of resetting session options for every query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)