dzamo edited a comment on pull request #2282: URL: https://github.com/apache/drill/pull/2282#issuecomment-963159298
> Let's consider a real world use case: some fixed width log generated by a database. Since the fields may be mashed together, there isn't a delimiter that you can use to divide the fields. You _could_ use however the logRegex reader to do this. That point aside for the moment, the way I imagined someone using this was that different configs could be set up and linked to workspaces such that if a file was in the `mysql_logs` folder, it would use the mysql log config, and if it was in the `postgres` it would use another. @cgivre This use case would still work after two `CREATE SCHEMA` statements to set the names and data types, wouldn't it? The schemas would be applied every subsequent query. > My opinion here is that the goal should be to get the cleanest data to the user as possible without the user having to rely on CASTs and other complicating factors. Let's drop the CASTs, those aren't fun. So we're left with different ways a user can specify column names and types. 1. With a `CREATE SCHEMA` against a directory. 2. With an inline schema to a table function. 3. With some plugin-specific format config that works for this plugin but generally not for others. Any one requires some effort, any one gets you to `select *` returning nice results (disclaimer: is this claim I'm making actually true?) which is super valuable. So shouldn't we avoid the quirky 3 and commit to 1 and 2 consistently wherever we can? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
