jihoonson commented on a change in pull request #9965:
URL: https://github.com/apache/druid/pull/9965#discussion_r441125726
##########
File path: docs/ingestion/faq.md
##########
@@ -66,6 +66,18 @@ Other common reasons that hand-off fails are as follows:
Make sure to include the `druid-hdfs-storage` and all the hadoop
configuration, dependencies (that can be obtained by running command `hadoop
classpath` on a machine where hadoop has been setup) in the classpath. And,
provide necessary HDFS settings as described in [deep
storage](../dependencies/deep-storage.md) .
+## How do I know when I can make query to Druid after submitting ingestion
task?
+
+You can verify if segments created by a recent ingestion task are loaded onto
historicals and available for querying using the following workflow.
+1. Submit your ingestion task.
+2. Repeatedly poll the [Overlord's tasks
API](../operations/api-reference.md#tasks) (
`/druid/indexer/v1/task/{taskId}/status`) until your task is shown to be
successfully completed.
+3. Poll the [Segment Loading by Datasource
API](../operations/api-reference.md#segment-loading-by-datasource)
(`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with
+`forceMetadataRefresh=true` and `interval=<INTERVAL_OF_INGESTED_DATA>` once.
Review comment:
I think it would be nice to warn one more time here what will happen
with `forceMetadataRefresh=true`. It could also be mentioned that this API will
refresh not only the specified datasource, but all datasources.
##########
File path: docs/ingestion/faq.md
##########
@@ -66,6 +66,18 @@ Other common reasons that hand-off fails are as follows:
Make sure to include the `druid-hdfs-storage` and all the hadoop
configuration, dependencies (that can be obtained by running command `hadoop
classpath` on a machine where hadoop has been setup) in the classpath. And,
provide necessary HDFS settings as described in [deep
storage](../dependencies/deep-storage.md) .
+## How do I know when I can make query to Druid after submitting ingestion
task?
Review comment:
I think this applies to only batch ingestion. In streaming ingestion,
each row becomes queryable once it's consumed by a realtime task.
##########
File path: docs/operations/api-reference.md
##########
@@ -114,6 +114,35 @@ Returns the number of segments to load and drop, as well
as the total segment lo
Returns the serialized JSON of segments to load and drop for each Historical
process.
+
+#### Segment Loading by Datasource
+
+##### GET
+
+*
`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
+
+Returns the percentage of segments actually loaded in the cluster versus
segments that should be loaded in the cluster for the given
+datasource over the given interval (or last 2 weeks if interval is not given).
`forceMetadataRefresh` is required to be set.
+Setting `forceMetadataRefresh` to true will force the coordinator to poll
latest segment metadata from the metadata store.
Review comment:
Same for other APIs.
##########
File path: docs/operations/api-reference.md
##########
@@ -114,6 +114,35 @@ Returns the number of segments to load and drop, as well
as the total segment lo
Returns the serialized JSON of segments to load and drop for each Historical
process.
+
+#### Segment Loading by Datasource
+
+##### GET
+
+*
`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
+
+Returns the percentage of segments actually loaded in the cluster versus
segments that should be loaded in the cluster for the given
+datasource over the given interval (or last 2 weeks if interval is not given).
`forceMetadataRefresh` is required to be set.
+Setting `forceMetadataRefresh` to true will force the coordinator to poll
latest segment metadata from the metadata store.
Review comment:
It would be nice to mention that it will refresh all datasources here
too.
##########
File path:
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java
##########
@@ -403,11 +427,17 @@ private void awaitOrPerformDatabasePoll()
}
/**
- * If the latest {@link DatabasePoll} is a {@link PeriodicDatabasePoll}, or
an {@link OnDemandDatabasePoll} that is
- * made not longer than {@link #periodicPollDelay} from now, awaits for it
and returns true; returns false otherwise,
- * meaning that a new on-demand database poll should be initiated.
+ * This method returns true without waiting for database poll if the latest
{@link DatabasePoll} is a
+ * {@link PeriodicDatabasePoll} that has completed it's first poll, or an
{@link OnDemandDatabasePoll} that is
+ * made not longer than {@link #periodicPollDelay} from current time.
+ * This method does wait untill completion for if the latest {@link
DatabasePoll} is a
+ * {@link PeriodicDatabasePoll} that has not completed it's first poll, or
an {@link OnDemandDatabasePoll} that is
+ * alrady in the process of polling the database.
Review comment:
typo: alrady -> already
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]