Paul Rogers created DRILL-5949:
----------------------------------

             Summary: JSON format options should be part of plugin config; not 
session options
                 Key: DRILL-5949
                 URL: https://issues.apache.org/jira/browse/DRILL-5949
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.12.0
            Reporter: Paul Rogers


Drill provides a JSON record reader. Drill provides two ways to configure this 
reader:

* Using the JSON plugin configuration.
* Using a set of session options.

The plugin configuration defines the file suffix associated with JSON files. 
The session options are:

* {{store.json.all_text_mode}}
* {{store.json.read_numbers_as_double}}
* {{store.json.reader.skip_invalid_records}}
* {{store.json.reader.print_skipped_invalid_record_number}}

Suppose I have to JSON files from different sources (and keep them in distinct 
directories.) For the one, I want to use {{all_text_mode}} off as the data is 
nicely formatted. Also, my numbers are fine, so I want 
{{read_numbers_as_double}} off.

But, the other file is a mess and uses a rather ad-hoc format. So, I want these 
two options turned on.

As it turns out I often query both files. Today, I must set the session options 
one way to query my "clean" file, then reverse them to query the "dirty" file.

Next, I want to join the two files. How do I set the options one way for the 
"clean" file, and the other for the "dirty" file within the *same query*? Can't.

Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It 
has a variety of options. But, the are *not* session options; they are instead 
options in the plugin definition. This allows me to, say, have a plugin config 
for CSV-with-headers files that I get from source A, and a different plugin 
config for my CSV-without-headers files from source B.

Suppose we applied the text reader technique to the JSON reader. We'd move the 
session options listed above into the JSON format plugin. Then, I can define 
one plugin for my "clean" files, and a different plugin config for my "dirty" 
files.

What's more, I can then use table functions to adjust the format for each file 
as needed within a single query. Since table functions are part of a query, I 
can add them to a view that I define for the various JSON files.

The result is a far simpler user experience than the tedium of resetting 
session options for every query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to