[GitHub] [druid] paul-rogers commented on pull request #12770: Type of __time column is determined by RowSignature in case of External Datasource

GitBox Thu, 28 Jul 2022 18:20:01 -0700


paul-rogers commented on PR #12770:
URL: https://github.com/apache/druid/pull/12770#issuecomment-1198787027


   A bit late to the party on this one. Hit this when merging code. I'm not 
sure the fixes are quite right.
   
   The fix in `ExternalTableMacro` essentially restricts the columns that can 
appear in an external file. Yet, users have no control over the existing data. 
It is not the job of a random CSV or JSON file to have a `__time` column of the 
form needed by Druid. That's the job of the mapping layer.
   
   Then, in `DruidTable`, we also check. But, again, that's the wrong place. 
Again, it is not the job of the table metadata to conform.
   
   The proper place for this kind of check is in the validator. But, since 
`INSERT` queries are not yet validated, the next best place is in the 
`DruidPlanner` where we're about to hand the `DruidSqlInsert` node off to the 
`QueryMaker`. At that point, we can check if the user has done the proper 
mapping.
   
   The result is that I should be able to have a CSV file with a `__time` 
column as a string and do:
   
   ```sql
   INSERT INTO foo
   SELECT TIME_PARSE("__time") AS __time, ...
   FROM TABLE(...)
   ```
   
   To be clear how SQL works: the first `__time` is in the name space of the 
input table row signature. The second one, in the `AS` clause, is in the name 
space of the row signature of the `SELECT` statement. Since we match-by-name 
for `INSERT`, this is also the column we'd insert into the target data source, 
`foo`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on pull request #12770: Type of __time column is determined by RowSignature in case of External Datasource

Reply via email to