gianm commented on PR #13360:
URL: https://github.com/apache/druid/pull/13360#issuecomment-1343606869

   A great improvement in usability! Specifying JSON-inside-SQL does get old. 
Thanks for improving this.
   
   After noticing this land in master, I realized I have a few comments arising 
from reading the new docs, so I'll write them here.
   
   - The new extension-column syntax is cool, and more SQLy. Would it be 
possible to use this with `EXTERN` too? Like, make the third argument (row 
signature) optional, and allow people to provide the column list in SQL?
   - Comma-separated files and URIs are a drag, I think. Local files and URIs 
can both legitimately have commas in them. Given that SQL supports arrays, 
could these be SQL arrays instead?
   - The bits about "this selects CsvInputFormat" and "this selects 
JsonInputFormat" should be reworded, because we shouldn't use Java class names 
in user-facing documentation. Instead we should use the type codes these 
classes have in the API, which would be "`csv` input format" and "`json` input 
format" respectively. These could link to the docs for those formats, i.e. 
`../ingestion/data-formats.html#csv` and `../ingestion-data-formats.html#json`.
   - Some table-function-format parameter names are different from the 
corresponding input-format parameters in ways that don't seem necessary. Like, 
`skipRows` vs `skipHeaderRows`. I think it'd be best if these were the same, so 
people can have an easier time translating between table-function-formats and 
input-formats.
   - For JSON, `keepNullColumns` isn't documented for the `json` input format, 
but `keepNulls` is documented for the table function. I'm not sure why it isn't 
documented for the input format. But I do note that the description here in the 
table function docs is confusing: what does it mean to "keep" or "not keep" 
null values? Given the confusingness of the behavior, and the fact that it 
isn't documented for input formats, I suggest we should remove it from the 
table-function-format too.
   - The "due to #13359 this functionality is broken" piece bugged me. It's not 
broken for the intended use case of a local input source, where you have a 
single-server setup, or a shared volume across all servers in a multi-server 
setup. In those cases, nothing bad happens by resolving relative paths upon 
first serialization. I mean, a user won't notice a difference, because all 
servers would resolve the path the same way anyway. So I was going to suggest 
removing this wording, or at least adjusting it such that it it's clear that 
the functionality works fine in the scenario it was designed for. However, I 
decided to simply fix it instead: https://github.com/apache/druid/pull/13534. 
That PR also removes the doc wording, since it's no longer needed.
   
   Btw, I also added the "Release Notes" label, since there is a "Release 
Notes" section in the PR description. The label helps the release manager find 
all the PRs that are associated with interesting release notes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to