[ 
https://issues.apache.org/jira/browse/KAFKA-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039452#comment-17039452
 ] 

ASF GitHub Bot commented on KAFKA-9546:
---------------------------------------

gcsaba2 commented on pull request #8134: KAFKA-9546 Allow custom tasks through 
configuration
URL: https://github.com/apache/kafka/pull/8134
 
 
   Currently FileStreamSourceConnector can only return a task of type 
FileStreamSourceTask. With this change the users can override it and provide a 
custom task class via configuration.
   
   Testing was done via unit tests. There's one positive case (custom Task 
class provided through config) and one negative (invalid class java.io.File was 
provided). The already existing unit tests are testing the default behavior, 
when FileStreamSourceTask is used.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make FileStreamSourceTask extendable with generic streams
> ---------------------------------------------------------
>
>                 Key: KAFKA-9546
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9546
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Csaba Galyo
>            Assignee: Csaba Galyo
>            Priority: Major
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Use case: I want to read a ZIP compressed text file with a file connector and 
> send it to Kafka.
> Currently, we have FileStreamSourceConnector which reads a \n delimited text 
> file. This connector always returns a task of type FileStreamSourceTask.
> The FileStreamSourceTask reads from stdio or opens a file InputStream. The 
> issue with this approach is that the input needs to be a text file, otherwise 
> it won't work. 
> The code should be modified so that users could change the default 
> InputStream to eg. ZipInputStream, or any other format. The code is currently 
> written in such a way that it's not possible to extend it, we cannot use a 
> different input stream. 
> See example here where the code got copy-pasted just so it could read from a 
> ZstdInputStream (which reads ZSTD compressed files): 
> [https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]
>  
> I suggest 2 changes:
>  # FileStreamSourceConnector should be extendable to return tasks of 
> different types. These types would be input by the user through the 
> configuration map
>  # FileStreamSourceTask should be modified so it could be extended and child 
> classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to