jihoonson commented on a change in pull request #9171: Doc update for the new
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367707404
##########
File path: docs/development/modules.md
##########
@@ -148,29 +150,43 @@ To start a segment killing task, you need to access the
old Coordinator console
After the killing task ends, `index.zip` (`partitionNum_index.zip` for HDFS
data storage) file should be deleted from the data storage.
-### Adding a new Firehose
+### Adding support for a new input source
-There is an example of this in the `s3-extensions` module with the
StaticS3FirehoseFactory.
+Adding support for a new input source requires to implement three interfaces,
i.e., `InputSource`, `InputEntity`, and `InputSourceReader`.
+`InputSource` is to define where the input data is stored. `InputEntity` is to
define how data can be read in parallel
+in [native parallel indexing](../ingestion/native-batch.md).
+`InputSourceReader` defines how to read your new input source and you can
simply use the provided `InputEntityIteratingReader` in most cases.
-Adding a Firehose is done almost entirely through the Jackson Modules instead
of Guice. Specifically, note the implementation
+There is an example of this in the `druid-s3-extensions` module with the
`S3InputSource` and `S3Entity`.
+
+Adding an InputSource is done almost entirely through the Jackson Modules
instead of Guice. Specifically, note the implementation
``` java
@Override
public List<? extends Module> getJacksonModules()
{
return ImmutableList.of(
- new SimpleModule().registerSubtypes(new
NamedType(StaticS3FirehoseFactory.class, "static-s3"))
+ new SimpleModule().registerSubtypes(new
NamedType(S3InputSource.class, "s3"))
);
}
```
-This is registering the FirehoseFactory with Jackson's polymorphic
serialization/deserialization layer. More concretely, having this will mean
that if you specify a `"firehose": { "type": "static-s3", ... }` in your
realtime config, then the system will load this FirehoseFactory for your
firehose.
+This is registering the InputSource with Jackson's polymorphic
serialization/deserialization layer. More concretely, having this will mean
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config,
then the system will load this InputSource for your `InputSource`
implementation.
+
+Note that inside of Druid, we have made the @JacksonInject annotation for
Jackson deserialized objects actually use the base Guice injector to resolve
the object to be injected. So, if your InputSource needs access to some
object, you can add a @JacksonInject annotation on a setter and it will get set
on instantiation.
Review comment:
Added.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]