[
https://issues.apache.org/jira/browse/DRILL-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers closed DRILL-6168.
------------------------------
> Table functions do not "inherit" default configuration
> ------------------------------------------------------
>
> Key: DRILL-6168
> URL: https://issues.apache.org/jira/browse/DRILL-6168
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.12.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> See DRILL-6167 that describes an attempt to use a table function with a regex
> format plugin.
> Consider the plugin configuration:
> {code}
> RegexFormatConfig sampleConfig = new RegexFormatConfig();
> sampleConfig.extension = "log1";
> sampleConfig.regex = DATE_ONLY_PATTERN;
> sampleConfig.fields = Lists.newArrayList("year", "month", "day");
> {code}
> (This plugin is defined in code in a test rather than the usual JSON in the
> Web console.)
> Run a test with the above. Things work fine.
> Now, try the plugin config with a table function as described in DRILL-6167:
> {code}
> String sql = "SELECT * FROM table(cp.`regex/simple.log2`\n" +
> "(type => 'regex', regex =>
> '(\\\\d\\\\d\\\\d\\\\d)-(\\\\d\\\\d)-(\\\\d\\\\d) .*'))";
> client.queryBuilder().sql(sql).printCsv();
> {code}
> Because we are using a file with suffix "log2", the query will match the
> format plugin config defined above. A query without the table function does,
> in fact, work using the defined config. But, with a table function, we get
> this warning from our regex code:
> {noformat}
> 13307 WARN [257590e1-e846-9d82-61d4-e246a4925ac3:frag:0:0]
> [org.apache.drill.exec.store.easy.regex.RegexRecordReader] - Column list has
> fewer
> names than the pattern has groups, filling extras with Column$n.
> {noformat}
> (The warning is in the custom plugin, not Drill.) This is the plugin saying,
> "hey! you didn't provide column names!". But, in the format definition, we
> did provide names. If we run the query without a table function, we do see
> those names used.
> Result:
> {noformat}
> 3 row(s):
> Column$0<VARCHAR(OPTIONAL)>,Column$1<VARCHAR(OPTIONAL)>,Column$2<VARCHAR(OPTIONAL)>
> 2017,12,17
> 2017,12,18
> 2017,12,19
> Total rows returned : 3. Returned in 9072ms.
> {noformat}
> Yes, indeed, the table function discarded the defined format config values,
> filling in blanks, including for the column names.
> The expected behavior is that all properties defined in the config should
> remain unchanged _except_ for those in the table function. Why? In order to
> know which format plugin to use, the code has to map from the suffix (".log2"
> here) to a format plugin _config_. (The config is the only thing that
> specifies a suffix.) Since we mapped to a config (not the unconfigured
> plugin), we'd expect the config properties to be used.
> It is highly surprising that all we get to use is the suffix, but all other
> attributes are ignored. This seems very much in the "bug" category and not at
> all in the "feature" category.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)