suneet-s commented on a change in pull request #9704: Refresh query docs.
URL: https://github.com/apache/druid/pull/9704#discussion_r408994814
##########
File path: docs/querying/datasource.md
##########
@@ -22,43 +22,317 @@ title: "Datasources"
~ under the License.
-->
+Datasources in Apache Druid are things that you can query. The most common
kind of datasource is a table datasource,
+and in many contexts the word "datasource" implicitly refers to table
datasources. This is especially true
+[during data ingestion](../ingestion/index.html), where ingestion is always
creating or writing into a table
+datasource. But at query time, there are many other types of datasources
available.
-A data source is the Apache Druid equivalent of a database table. However, a
query can also masquerade as a data source, providing subquery-like
functionality. Query data sources are currently supported only by
[GroupBy](../querying/groupbyquery.md) queries.
+In the [Druid SQL](sql.html) language, datasources are provided in the [`FROM`
clause](sql.html#from).
-### Table datasource
-The table data source is the most common type. It's represented by a string,
or by the full structure:
+The word "datasource" is generally spelled `dataSource` (with a capital S)
when it appears in API requests and
+responses.
+## Datasource type
+
+### `table`
+
+<!--DOCUSAURUS_CODE_TABS-->
+<!--SQL-->
+```sql
+SELECT column1, column2 FROM "druid"."dataSourceName"
+```
+<!--Native-->
+```json
+{
+ "queryType": "scan",
+ "dataSource": "dataSourceName",
+ "columns": ["column1", "column2"],
+ "intervals": ["0000/3000"]
+}
+```
+<!--END_DOCUSAURUS_CODE_TABS-->
+
+The table datasource is the most common type. This is the kind of datasource
you get when you perform
+[data ingestion](../ingestion/index.html). They are split up into segments,
distributed around the cluster,
+and queried in parallel.
+
+In [Druid SQL](sql.html#from), table datasources reside in the the `druid`
schema. This is the default schema, so table
+datasources can be referenced as either `druid.dataSourceName` or simply
`dataSourceName`.
+
+In native queries, table datasources can be referenced using their names as
strings (as in the example above), or by
+using JSON objects of the form:
+
+```json
+"dataSource": {
+ "type": "table",
+ "name": "dataSourceName"
+}
+```
+
+To see a list of all table datasources, use the SQL query
+`SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'druid'`.
+
+### `lookup`
+
+<!--DOCUSAURUS_CODE_TABS-->
+<!--SQL-->
+```sql
+SELECT k, v FROM lookup.countries
+```
+<!--Native-->
+```json
+{
+ "queryType": "scan",
+ "dataSource": {
+ "type": "lookup",
+ "lookup": "countries"
+ },
+ "columns": ["k", "v"],
+ "intervals": ["0000/3000"]
+}
+```
+<!--END_DOCUSAURUS_CODE_TABS-->
+
+Lookup datasources correspond to Druid's key-value [lookup](lookups.html)
objects. In [Druid SQL](sql.html#from),
+they reside in the the `lookup` schema. They are preloaded in memory on all
servers, so they can be accessed rapidly.
Review comment:
nit: all servers is vague. Are they pre-loaded on the master nodes too? I
think it can be configured so that only the nodes that need lookups have them
pre-loaded. My understanding is - brokers (always), historicals (always?),
overlord, MM/ Indexer (only if it's needed by ingestion).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]