[GitHub] [druid] jihoonson commented on a change in pull request #9965: API to verify a datasource has the latest ingested data

GitBox Tue, 16 Jun 2020 14:25:17 -0700


jihoonson commented on a change in pull request #9965:
URL: https://github.com/apache/druid/pull/9965#discussion_r441125726




##########
File path: docs/ingestion/faq.md
##########
@@ -66,6 +66,18 @@ Other common reasons that hand-off fails are as follows:
 
 Make sure to include the `druid-hdfs-storage` and all the hadoop 
configuration, dependencies (that can be obtained by running command `hadoop 
classpath` on a machine where hadoop has been setup) in the classpath. And, 
provide necessary HDFS settings as described in [deep 
storage](../dependencies/deep-storage.md) .
 
+## How do I know when I can make query to Druid after submitting ingestion 
task?
+
+You can verify if segments created by a recent ingestion task are loaded onto 
historicals and available for querying using the following workflow.
+1. Submit your ingestion task.
+2. Repeatedly poll the [Overlord's tasks 
API](../operations/api-reference.md#tasks) ( 
`/druid/indexer/v1/task/{taskId}/status`) until your task is shown to be 
successfully completed.
+3. Poll the [Segment Loading by Datasource 
API](../operations/api-reference.md#segment-loading-by-datasource) 
(`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with 
+`forceMetadataRefresh=true` and `interval=<INTERVAL_OF_INGESTED_DATA>` once.

Review comment:
       I think it would be nice to warn one more time here what will happen 
with `forceMetadataRefresh=true`. It could also be mentioned that this API will 
refresh not only the specified datasource, but all datasources.

##########
File path: docs/ingestion/faq.md
##########
@@ -66,6 +66,18 @@ Other common reasons that hand-off fails are as follows:
 
 Make sure to include the `druid-hdfs-storage` and all the hadoop 
configuration, dependencies (that can be obtained by running command `hadoop 
classpath` on a machine where hadoop has been setup) in the classpath. And, 
provide necessary HDFS settings as described in [deep 
storage](../dependencies/deep-storage.md) .
 
+## How do I know when I can make query to Druid after submitting ingestion 
task?

Review comment:
       I think this applies to only batch ingestion. In streaming ingestion, 
each row becomes queryable once it's consumed by a realtime task.

##########
File path: docs/operations/api-reference.md
##########
@@ -114,6 +114,35 @@ Returns the number of segments to load and drop, as well 
as the total segment lo
 
 Returns the serialized JSON of segments to load and drop for each Historical 
process.
 
+
+#### Segment Loading by Datasource
+
+##### GET
+
+* 
`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
+
+Returns the percentage of segments actually loaded in the cluster versus 
segments that should be loaded in the cluster for the given 
+datasource over the given interval (or last 2 weeks if interval is not given). 
`forceMetadataRefresh` is required to be set. 
+Setting `forceMetadataRefresh` to true will force the coordinator to poll 
latest segment metadata from the metadata store. 

Review comment:
       Same for other APIs.

##########
File path: docs/operations/api-reference.md
##########
@@ -114,6 +114,35 @@ Returns the number of segments to load and drop, as well 
as the total segment lo
 
 Returns the serialized JSON of segments to load and drop for each Historical 
process.
 
+
+#### Segment Loading by Datasource
+
+##### GET
+
+* 
`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
+
+Returns the percentage of segments actually loaded in the cluster versus 
segments that should be loaded in the cluster for the given 
+datasource over the given interval (or last 2 weeks if interval is not given). 
`forceMetadataRefresh` is required to be set. 
+Setting `forceMetadataRefresh` to true will force the coordinator to poll 
latest segment metadata from the metadata store. 

Review comment:
       It would be nice to mention that it will refresh all datasources here 
too.

##########
File path: 
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java
##########
@@ -403,11 +427,17 @@ private void awaitOrPerformDatabasePoll()
   }
 
   /**
-   * If the latest {@link DatabasePoll} is a {@link PeriodicDatabasePoll}, or 
an {@link OnDemandDatabasePoll} that is
-   * made not longer than {@link #periodicPollDelay} from now, awaits for it 
and returns true; returns false otherwise,
-   * meaning that a new on-demand database poll should be initiated.
+   * This method returns true without waiting for database poll if the latest 
{@link DatabasePoll} is a
+   * {@link PeriodicDatabasePoll} that has completed it's first poll, or an 
{@link OnDemandDatabasePoll} that is
+   * made not longer than {@link #periodicPollDelay} from current time.
+   * This method does wait untill completion for if the latest {@link 
DatabasePoll} is a
+   * {@link PeriodicDatabasePoll} that has not completed it's first poll, or 
an {@link OnDemandDatabasePoll} that is
+   * alrady in the process of polling the database.

Review comment:
       typo: alrady -> already




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #9965: API to verify a datasource has the latest ingested data

Reply via email to