(druid) branch master updated: docs: Correct some outdated module names (#16584)

abhishekrb Tue, 11 Jun 2024 14:18:04 -0700

This is an automated email from the ASF dual-hosted git repository.

abhishekrb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new fec48432d4a docs: Correct some outdated module names (#16584)
fec48432d4a is described below

commit fec48432d4af9d801db0c5b10669ccb19beb20d6
Author: Andreas Maechler <[email protected]>
AuthorDate: Tue Jun 11 15:17:40 2024 -0600

    docs: Correct some outdated module names (#16584)
    
    * Fix module names
    
    * Better spacing
    
    * Some spacing
    
    * Suggestions from code review
    
    Thanks Abhishek.
    
    * More links
    
    * Roll-up time
    
    * Remove logs
    
    * More spelling
---
 docs/development/modules.md | 108 +++++++++++++++++++-------------------------
 website/.spelling           |   2 +-
 2 files changed, 48 insertions(+), 62 deletions(-)

diff --git a/docs/development/modules.md b/docs/development/modules.md
index 5b31d2d0aaa..c62a6d4a086 100644
--- a/docs/development/modules.md
+++ b/docs/development/modules.md
@@ -22,12 +22,11 @@ title: "Creating extensions"
   ~ under the License.
   -->
 
-
 Druid uses a module system that allows for the addition of extensions at 
runtime.
 
 ## Writing your own extensions
 
-Druid's extensions leverage Guice in order to add things at runtime.  
Basically, Guice is a framework for Dependency Injection, but we use it to hold 
the expected object graph of the Druid process.  Extensions can make any 
changes they want/need to the object graph via adding Guice bindings.  While 
the extensions actually give you the capability to change almost anything 
however you want, in general, we expect people to want to extend one of the 
things listed below.  This means that we h [...]
+Druid's extensions leverage Guice in order to add things at runtime. 
Basically, Guice is a framework for Dependency Injection, but we use it to hold 
the expected object graph of the Druid process. Extensions can make any changes 
they want/need to the object graph via adding Guice bindings. While the 
extensions actually give you the capability to change almost anything however 
you want, in general, we expect people to want to extend one of the things 
listed below. This means that we honor [...]
 
 1. Add a new deep storage implementation by extending the 
`org.apache.druid.segment.loading.DataSegment*` and
    `org.apache.druid.tasklogs.TaskLog*` classes.
@@ -61,11 +60,11 @@ The DruidModule class is has two methods
 
 The `configure(Binder)` method is the same method that a normal Guice module 
would have.
 
-The `getJacksonModules()` method provides a list of Jackson modules that are 
used to help initialize the Jackson ObjectMapper instances used by Druid.  This 
is how you add extensions that are instantiated via Jackson (like 
AggregatorFactory and InputSource objects) to Druid.
+The `getJacksonModules()` method provides a list of Jackson modules that are 
used to help initialize the Jackson ObjectMapper instances used by Druid. This 
is how you add extensions that are instantiated via Jackson (like 
AggregatorFactory and InputSource objects) to Druid.
 
 ### Registering your Druid Module
 
-Once you have your DruidModule created, you will need to package an extra file 
in the `META-INF/services` directory of your jar.  This is easiest to 
accomplish with a maven project by creating files in the `src/main/resources` 
directory.  There are examples of this in the Druid code under the 
`cassandra-storage`, `hdfs-storage` and `s3-extensions` modules, for examples.
+Once you have your DruidModule created, you will need to package an extra file 
in the `META-INF/services` directory of your jar. This is easiest to accomplish 
with a maven project by creating files in the `src/main/resources` directory. 
There are examples of this in the Druid code under the `cassandra-storage`, 
`hdfs-storage` and `s3-extensions` modules, for examples.
 
 The file that should exist in your jar is
 
@@ -73,19 +72,19 @@ The file that should exist in your jar is
 
 It should be a text file with a new-line delimited list of package-qualified 
classes that implement DruidModule like
 
-```
+```txt
 org.apache.druid.storage.cassandra.CassandraDruidModule
 ```
 
-If your jar has this file, then when it is added to the classpath or as an 
extension, Druid will notice the file and will instantiate instances of the 
Module.  Your Module should have a default constructor, but if you need access 
to runtime configuration properties, it can have a method with @Inject on it to 
get a Properties object injected into it from Guice.
+If your jar has this file, then when it is added to the classpath or as an 
extension, Druid will notice the file and will instantiate instances of the 
Module. Your Module should have a default constructor, but if you need access 
to runtime configuration properties, it can have a method with @Inject on it to 
get a Properties object injected into it from Guice.
 
 ### Adding a new deep storage implementation
 
-Check the `azure-storage`, `google-storage`, `cassandra-storage`, 
`hdfs-storage` and `s3-extensions` modules for examples of how to do this.
+Check the `druid-azure-extensions`, `druid-google-extensions`, 
`druid-cassandra-storage`, `druid-hdfs-storage` and `druid-s3-extensions` 
modules for examples of how to do this.
 
-The basic idea behind the extension is that you need to add bindings for your 
DataSegmentPusher and DataSegmentPuller objects.  The way to add them is 
something like (taken from HdfsStorageDruidModule)
+The basic idea behind the extension is that you need to add bindings for your 
[`DataSegmentPusher`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java)
 and 
[`URIDataPuller`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/loading/URIDataPuller.java)
 objects. The way to add them is something like (taken from 
HdfsStorageDruidModule)
 
-``` java
+```java
 Binders.dataSegmentPullerBinder(binder)
        .addBinding("hdfs")
        .to(HdfsDataSegmentPuller.class).in(LazySingleton.class);
@@ -95,54 +94,35 @@ Binders.dataSegmentPusherBinder(binder)
        .to(HdfsDataSegmentPusher.class).in(LazySingleton.class);
 ```
 
-`Binders.dataSegment*Binder()` is a call provided by the druid-core jar which 
sets up a Guice multibind "MapBinder".  If that doesn't make sense, don't worry 
about it, just think of it as a magical incantation.
+`Binders.dataSegment*Binder()` is a call provided by the druid-core jar which 
sets up a Guice [multibind](https://github.com/google/guice/wiki/Multibindings) 
"MapBinder". If that doesn't make sense, don't worry about it; just think of it 
as a magical incantation.
 
-`addBinding("hdfs")` for the Puller binder creates a new handler for loadSpec 
objects of type "hdfs".  For the Pusher binder it creates a new type value that 
you can specify for the `druid.storage.type` parameter.
+`addBinding("hdfs")` for the Puller binder creates a new handler for loadSpec 
objects of type "hdfs". For the Pusher binder it creates a new type value that 
you can specify for the `druid.storage.type` parameter.
 
 `to(...).in(...);` is normal Guice stuff.
 
-In addition to DataSegmentPusher and DataSegmentPuller, you can also bind:
+In addition to `DataSegmentPusher` and `URIDataPuller`, you can also bind:
 
-* DataSegmentKiller: Removes segments, used as part of the Kill Task to delete 
unused segments, i.e. perform garbage collection of segments that are either 
superseded by newer versions or that have been dropped from the cluster.
-* DataSegmentMover: Allow migrating segments from one place to another, 
currently this is only used as part of the MoveTask to move unused segments to 
a different S3 bucket or prefix, typically to reduce storage costs of unused 
data (e.g. move to glacier or cheaper storage)
-* DataSegmentArchiver: Just a wrapper around Mover, but comes with a 
preconfigured target bucket/path, so it doesn't have to be specified at runtime 
as part of the ArchiveTask.
+* 
[`DataSegmentKiller`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentKiller.java):
 Removes segments, used as part of the Kill Task to delete unused segments, 
i.e. perform garbage collection of segments that are either superseded by newer 
versions or that have been dropped from the cluster.
+* 
[`DataSegmentMover`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentMover.java):
 Allow migrating segments from one place to another, currently this is only 
used as part of the MoveTask to move unused segments to a different S3 bucket 
or prefix, typically to reduce storage costs of unused data (e.g. move to 
glacier or cheaper storage)
+* 
[`DataSegmentArchiver`](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentArchiver.java):
 Just a wrapper around Mover, but comes with a preconfigured target 
bucket/path, so it doesn't have to be specified at runtime as part of the 
ArchiveTask.
 
 ### Validating your deep storage implementation
 
 **WARNING!** This is not a formal procedure, but a collection of hints to 
validate if your new deep storage implementation is able do push, pull and kill 
segments.
 
 It's recommended to use batch ingestion tasks to validate your implementation.
-The segment will be automatically rolled up to Historical note after ~20 
seconds.
+The segment will be automatically rolled up to a Historical node after ~1 
minute.
 In this way, you can validate both push (at realtime process) and pull (at 
Historical process) segments.
 
-* DataSegmentPusher
+#### DataSegmentPusher
 
 Wherever your data storage (cloud storage service, distributed file system, 
etc.) is, you should be able to see one new file: `index.zip` 
(`partitionNum_index.zip` for HDFS data storage) after your ingestion task ends.
 
-* DataSegmentPuller
+#### URIDataPuller
 
-After ~20 secs your ingestion task ends, you should be able to see your 
Historical process trying to load the new segment.
-
-The following example was retrieved from a Historical process configured to 
use Azure for deep storage:
-
-```
-2015-04-14T02:42:33,450 INFO [ZkCoordinator-0] 
org.apache.druid.server.coordination.ZkCoordinator - New request[LOAD: 
dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00
-.000Z_2015-04-14T02:41:09.484Z] with 
zNode[/druid/dev/loadQueue/192.168.33.104:8081/dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z_2015-04-14T02:41:09.
-484Z].
-2015-04-14T02:42:33,451 INFO [ZkCoordinator-0] 
org.apache.druid.server.coordination.ZkCoordinator - Loading segment 
dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.0
-00Z_2015-04-14T02:41:09.484Z
-2015-04-14T02:42:33,463 INFO [ZkCoordinator-0] 
org.apache.druid.guice.JsonConfigurator - Loaded class[class 
org.apache.druid.storage.azure.AzureAccountConfig] from props[drui
-d.azure.] as [org.apache.druid.storage.azure.AzureAccountConfig@759c9ad9]
-2015-04-14T02:49:08,275 INFO [ZkCoordinator-0] 
org.apache.druid.utils.CompressionUtils - Unzipping 
file[/opt/druid/tmp/compressionUtilZipCache1263964429587449785.z
-ip] to 
[/opt/druid/zk_druid/dde/2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z/2015-04-14T02:41:09.484Z/0]
-2015-04-14T02:49:08,276 INFO [ZkCoordinator-0] 
org.apache.druid.storage.azure.AzureDataSegmentPuller - Loaded 1196 bytes from 
[dde/2015-01-02T00:00:00.000Z_2015-01-03
-T00:00:00.000Z/2015-04-14T02:41:09.484Z/0/index.zip] to 
[/opt/druid/zk_druid/dde/2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z/2015-04-14T02:41:09.484Z/0]
-2015-04-14T02:49:08,277 WARN [ZkCoordinator-0] 
org.apache.druid.segment.loading.SegmentLocalCacheManager - Segment 
[dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z_2015-04-14T02:41:09.484Z]
 is different than expected size. Expected [0] found [1196]
-2015-04-14T02:49:08,282 INFO [ZkCoordinator-0] 
org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing 
segment[dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z_2015-04-14T02:41:09.484Z]
 at 
path[/druid/dev/segments/192.168.33.104:8081/192.168.33.104:8081_historical__default_tier_2015-04-14T02:49:08.282Z_7bb87230ebf940188511dd4a53ffd7351]
-2015-04-14T02:49:08,292 INFO [ZkCoordinator-0] 
org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: 
dde_2015-01-02T00:00:00.000Z_2015-01-03T00:00:00.000Z_2015-04-14T02:41:09.484Z]
-```
+After ~1 minute your ingestion task ends, you should be able to see your 
Historical process trying to load the new segment.
 
-* DataSegmentKiller
+#### DataSegmentKiller
 
 The easiest way of testing the segment killing is marking a segment as not 
used and then starting a killing task in the [web 
console](../operations/web-console.md).
 
@@ -163,19 +143,19 @@ There is an example of this in the `druid-s3-extensions` 
module with the `S3Inpu
 
 Adding an InputSource is done almost entirely through the Jackson Modules 
instead of Guice. Specifically, note the implementation
 
-``` java
+```java
 @Override
 public List<? extends Module> getJacksonModules()
 {
   return ImmutableList.of(
-          new SimpleModule().registerSubtypes(new 
NamedType(S3InputSource.class, "s3"))
+      new SimpleModule().registerSubtypes(new NamedType(S3InputSource.class, 
"s3"))
   );
 }
 ```
 
-This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
+This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer. More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
 
-Note that inside of Druid, we have made the `@JacksonInject` annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected.  So, if your InputSource needs access to some 
object, you can add a `@JacksonInject` annotation on a setter and it will get 
set on instantiation.
+Note that inside of Druid, we have made the `@JacksonInject` annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected. So, if your InputSource needs access to some object, 
you can add a `@JacksonInject` annotation on a setter and it will get set on 
instantiation.
 
 ### Adding support for a new data format
 
@@ -183,16 +163,16 @@ Adding support for a new data format requires 
implementing two interfaces, i.e.,
 `InputFormat` is to define how your data is formatted. `InputEntityReader` is 
to define how to parse your data and convert into Druid `InputRow`.
 
 There is an example in the `druid-orc-extensions` module with the 
`OrcInputFormat` and `OrcReader`.
- 
+
 Adding an InputFormat is very similar to adding an InputSource. They operate 
purely through Jackson and thus should just be additions to the Jackson modules 
returned by your DruidModule.
 
 ### Adding Aggregators
 
-Adding AggregatorFactory objects is very similar to InputSource objects.  They 
operate purely through Jackson and thus should just be additions to the Jackson 
modules returned by your DruidModule.
+Adding AggregatorFactory objects is very similar to InputSource objects. They 
operate purely through Jackson and thus should just be additions to the Jackson 
modules returned by your DruidModule.
 
 ### Adding Complex Metrics
 
-Adding ComplexMetrics is a little ugly in the current version.  The method of 
getting at complex metrics is through registration with the 
`ComplexMetrics.registerSerde()` method.  There is no special Guice stuff to 
get this working, just in your `configure(Binder)` method register the 
serialization/deserialization.
+Adding ComplexMetrics is a little ugly in the current version. The method of 
getting at complex metrics is through registration with the 
`ComplexMetrics.registerSerde()` method. There is no special Guice stuff to get 
this working, just in your `configure(Binder)` method register the 
serialization/deserialization.
 
 ### Adding new Query types
 
@@ -202,9 +182,9 @@ Adding a new Query type requires the implementation of 
three interfaces.
 1. `org.apache.druid.query.QueryToolChest`
 1. `org.apache.druid.query.QueryRunnerFactory`
 
-Registering these uses the same general strategy as a deep storage mechanism 
does.  You do something like
+Registering these uses the same general strategy as a deep storage mechanism 
does. You do something like
 
-``` java
+```java
 DruidBinders.queryToolChestBinder(binder)
             .addBinding(SegmentMetadataQuery.class)
             .to(SegmentMetadataQueryQueryToolChest.class);
@@ -214,7 +194,7 @@ DruidBinders.queryRunnerFactoryBinder(binder)
             .to(SegmentMetadataQueryRunnerFactory.class);
 ```
 
-The first one binds the SegmentMetadataQueryQueryToolChest for usage when a 
SegmentMetadataQuery is used.  The second one does the same thing but for the 
QueryRunnerFactory instead.
+The first one binds the SegmentMetadataQueryQueryToolChest for usage when a 
SegmentMetadataQuery is used. The second one does the same thing but for the 
QueryRunnerFactory instead.
 
 ### Adding new Jersey resources
 
@@ -230,7 +210,7 @@ You will need to implement 
`org.apache.druid.metadata.PasswordProvider` interfac
 thus make sure all the necessary information required for fetching each 
password is supplied during object instantiation.
 In your implementation of `org.apache.druid.initialization.DruidModule`, 
`getJacksonModules` should look something like this -
 
-``` java
+```java
     return ImmutableList.of(
         new SimpleModule("SomePasswordProviderModule")
             .registerSubtypes(
@@ -247,7 +227,7 @@ You will need to implement 
`org.apache.druid.metadata.DynamicConfigProvider` int
 thus make sure all the necessary information required for fetching all 
information is supplied during object instantiation.
 In your implementation of `org.apache.druid.initialization.DruidModule`, 
`getJacksonModules` should look something like this -
 
-``` java
+```java
     return ImmutableList.of(
         new SimpleModule("SomeDynamicConfigProviderModule")
             .registerSubtypes(
@@ -326,12 +306,13 @@ public class MyTransformModule implements DruidModule {
 
 ### Adding your own custom pluggable Coordinator Duty
 
-The coordinator periodically runs jobs, so-called `CoordinatorDuty` which 
include loading new segments, segment balancing, etc. 
+The coordinator periodically runs jobs, so-called `CoordinatorDuty` which 
include loading new segments, segment balancing, etc.
 Druid users can add custom pluggable coordinator duties, which are not part of 
Core Druid, without modifying any Core Druid classes.
 Users can do this by writing their own custom coordinator duty implementing 
the interface `CoordinatorCustomDuty` and setting the `JsonTypeName`.
 Next, users will need to register their custom coordinator as subtypes in 
their Module's `DruidModule#getJacksonModules()`.
 Once these steps are done, user will be able to load their custom coordinator 
duty using the following properties:
-```
+
+```properties
 druid.coordinator.dutyGroups=[<GROUP_NAME_1>, <GROUP_NAME_2>, ...]
 
druid.coordinator.<GROUP_NAME_1>.duties=[<DUTY_NAME_MATCHING_JSON_TYPE_NAME_1>, 
<DUTY_NAME_MATCHING_JSON_TYPE_NAME_2>, ...]
 druid.coordinator.<GROUP_NAME_1>.period=<GROUP_NAME_1_RUN_PERIOD>
@@ -339,16 +320,17 @@ 
druid.coordinator.<GROUP_NAME_1>.period=<GROUP_NAME_1_RUN_PERIOD>
 
druid.coordinator.<GROUP_NAME_1>.duty.<DUTY_NAME_MATCHING_JSON_TYPE_NAME_1>.<SOME_CONFIG_1_KEY>=<SOME_CONFIG_1_VALUE>
 
druid.coordinator.<GROUP_NAME_1>.duty.<DUTY_NAME_MATCHING_JSON_TYPE_NAME_1>.<SOME_CONFIG_2_KEY>=<SOME_CONFIG_2_VALUE>
 ```
+
 In the new system for pluggable Coordinator duties, similar to what 
coordinator already does today, the duties can be grouped together.
-The duties will be grouped into multiple groups as per the elements in list 
`druid.coordinator.dutyGroups`. 
+The duties will be grouped into multiple groups as per the elements in list 
`druid.coordinator.dutyGroups`.
 All duties in the same group will have the same run period configured by 
`druid.coordinator.<GROUP_NAME>.period`.
-Currently, there is a single thread running the duties sequentially for each 
group. 
+Currently, there is a single thread running the duties sequentially for each 
group.
 
 For example, see `KillSupervisorsCustomDuty` for a custom coordinator duty 
implementation and the `custom-coordinator-duties`
 integration test group which loads `KillSupervisorsCustomDuty` using the 
configs set in 
`integration-tests/docker/environment-configs/test-groups/custom-coordinator-duties`.
 This config file adds the configs below to enable a custom coordinator duty.
 
-```
+```properties
 druid.coordinator.dutyGroups=["cleanupMetadata"]
 druid.coordinator.cleanupMetadata.duties=["killSupervisors"]
 druid.coordinator.cleanupMetadata.duty.killSupervisors.durationToRetain=PT0M
@@ -360,13 +342,15 @@ The custom coordinator duty `killSupervisors` also has a 
config called `duration
 
 ### Routing data through a HTTP proxy for your extension
 
-You can add the ability for the `HttpClient` of your extension to connect 
through an HTTP proxy. 
+You can add the ability for the `HttpClient` of your extension to connect 
through an HTTP proxy.
 
 To support proxy connection for your extension's HTTP client:
-1. Add `HttpClientProxyConfig` as a `@JsonProperty` to the HTTP config class 
of your extension. 
-2. In the extension's module class, add `HttpProxyConfig` config to 
`HttpClientConfig`. 
+
+1. Add `HttpClientProxyConfig` as a `@JsonProperty` to the HTTP config class 
of your extension.
+2. In the extension's module class, add `HttpProxyConfig` config to 
`HttpClientConfig`.
 For example, where `config` variable is the extension's HTTP config from step 
1:
-```
+
+```java
 final HttpClientConfig.Builder builder = HttpClientConfig
     .builder()
     .withNumConnections(1)
@@ -387,7 +371,8 @@ there. In the end, you should see your extension underneath 
`distribution/target
 ### Managing dependencies
 
 Managing library collisions can be daunting for extensions which draw in 
commonly used libraries. Here is a list of group IDs for libraries that are 
suggested to be specified with a `provided` scope to prevent collision with 
versions used in druid:
-```
+
+```txt
 "org.apache.druid",
 "com.metamx.druid",
 "asm",
@@ -420,4 +405,5 @@ Managing library collisions can be daunting for extensions 
which draw in commonl
 "org.roaringbitmap",
 "net.java.dev.jets3t"
 ```
+
 See the documentation in `org.apache.druid.cli.PullDependencies` for more 
information.
diff --git a/website/.spelling b/website/.spelling
index bce41f73a87..6bda08d608d 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -1279,7 +1279,7 @@ ComplexMetrics
 DataSegmentArchiver
 DataSegmentKiller
 DataSegmentMover
-DataSegmentPuller
+URIDataPuller
 DataSegmentPusher
 DruidModule
 ExtractionFns


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: docs: Correct some outdated module names (#16584)

Reply via email to