dlg99 opened a new pull request #9825:
URL: https://github.com/apache/pulsar/pull/9825


   ### Motivation
   
   Provide a way to use Kafka-connect Sink as a pulsar sink, in cases like:
   - company has custom kafka sink and want to try the pulsar out
   - no corresponding pulsar sink exists
   etc.
   
   ### Modifications
   
   Added "kafka producer" that uses kafka sink to dump data to the 3rd system 
bypassing kafka.
   Added configuration options (kafkaConnectorSinkClass, 
kafkaConnectorConfigProperties) for the pulsar-kafka sink
   Split `pulsar-io/kafka` module into `pulsar-io/kafka` (builds jar) and 
`pulsar-io/kafka-nar` (builds nar)
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
   
   Added unit test.
   Tested locally as
   
   Ran pulsar standalone
   Built pulsario/kafka-nar as `mvn clean package -DskipTests -P 
packageKafkaConnect` to include kafka's connect-file sink into the nar.
   Ran test nar as
   ```
   bin/pulsar-admin sinks localrun -a ~/pulsar-io-kafka-nar-2.8.0-SNAPSHOT.nar 
--name kwrap --namespace public/default/ktest --parallelism 1 -i my-topic 
--sink-config-file ~/sink.yaml
   ```
   with
   ```
   $ cat ~/sink.yaml
   configs:
     "topic": "test"
     "kafkaConnectorSinkClass": 
"org.apache.kafka.connect.file.FileStreamSinkConnector"
     "kafkaConnectorConfigProperties":
       "file": "/tmp/sink_test.out"
   ```
   message produced as
   ```
   bin/pulsar-client produce my-topic --messages "hello-pulsar"
   ```
   and got
   ```
   $ cat /tmp/sink_test.out
   [B@242e2d8b
   ```
   which is perfect (`[B@...` is from FileSink writing byte array as a string, 
currently expected)
   Schema support is needed in the pulsar's kafka sink to support non-byte[] 
data, tbd.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes)
     - If yes, how is the feature documented? (not documented yet)
     - If a feature is not documented yet in this PR, please create a followup 
issue for adding the documentation
   TBD following review
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to