John Omernik created DRILL-4130:
-----------------------------------

             Summary: Ability to set settings at Table or View level rather 
than SESSION or SYSTEM
                 Key: DRILL-4130
                 URL: https://issues.apache.org/jira/browse/DRILL-4130
             Project: Apache Drill
          Issue Type: Improvement
          Components: Metadata
    Affects Versions: 1.3.0
         Environment: All
            Reporter: John Omernik
             Fix For: Future


There are a number of settings within drill for handling data that due to low 
level of granularity there may be unintended data reading consequences. A few 
examples include:

store.json.read_numbers_as_double
and
store.json.all_text_mode

(There are likely more, these are some I've worked with)

The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
how when dealing with certain types of data, that these settings can be helpful 
for reading, and indeed some queries fail with a suggestion to change these 
settings. 

A few points here. 1. The documentation suggests alter system commands.  This 
is not ideal as it changes the default way drill handles data for all users AND 
not all users will (should) have the privs to enter this command.  The 
documentation at a minimum should show alter session (or provide a clearer 
understanding of the difference) 

But even with alter session, that affects reads for all JSON files for that 
session, when in reality, the reasoning behind the setting is to be able to 
read a specific table that has poorly formed JSON.  Thus, issuing a command 
that alters how Drill reads all JSON in order to read one table of JSON could 
have unintended consequences, especially for a user who just wants to be able 
to read things and issues commands without thinking things through. 

Now as an administrator, there are two use cases here.  One is I have a table 
of poorly formed JSON that requires one of these settings, and I can't change 
the source, therefore, can I create a view that makes it so all reads of this 
table are done with the more permissive  setting? Setting these in a view would 
be very helpful from an administrator perspective for known bad data sources.  
Keep users from having to think about it, and let them do their exploration. 

The other use case, is the ability for a user to set a session level read that 
only applies for the table being read.  alter session set 
"%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
that display use that as the default suggestion) that way, the user can issue 
the command, but not have downstream consequences in their session while 
reading other tables. 

Either case is valuable to an administrator, and could help prevent data read 
issues. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to