Hey Florian, It seems reasonable to me to let the connector track task progress through offsets. I recall there have been other use cases for communication between tasks and connectors (perhaps Ewen or someone else will jump in here and mention them), so I'm not sure if there if this could fall under a more general solution. Using the offsets topic has the advantage of simplicity since it doesn't add anything new. Perhaps start with a JIRA describing the use case and see if anyone has additional feedback?
-Jason On Mon, Feb 20, 2017 at 3:24 AM, Florian Hussonnois <fhussonn...@gmail.com> wrote: > Hi Jason, > > Yes, this is the idea. The connector assigns a subset of files to each > task. > > A task stores the size of file, the bytes offset and the bytes size of the > last sent record as a source offsets. > A file is finished when recordBytesOffsets + recordBytesSize = > fileBytesSize. > > The connector should be able to start a thread in background to track > offsets for each assigned file. > When all tasks has finished the connector can stop tasks or assigned new > files by requesting tasks reconfiguration. > > Another advantage of monitoring source offsets from the connector is detect > slow or failed tasks and if necessary to be able to restart all tasks. > > Thanks, > > 2017-02-18 6:47 GMT+01:00 Jason Gustafson <ja...@confluent.io>: > > > Hey Florian, > > > > Can you explain a bit more how having access to the offset storage from > the > > connector helps in your use case? I guess you are planning to use offsets > > to be able to tell when a task has finished a file? > > > > Thanks, > > Jason > > > > On Fri, Feb 17, 2017 at 4:45 AM, Florian Hussonnois < > fhussonn...@gmail.com > > > > > wrote: > > > > > Hi Kafka Team, > > > > > > I'm developping a connector which need to monitor the progress of its > > tasks > > > in order to be able to request a tasks reconfiguration in some > > situations. > > > > > > Our connector is pretty simple. It's used to stream a thousands of > files > > > into Kafka. The connector scans directories then schedules each task > > with a > > > set of assigned files. > > > When tasks are no longer required or new files are detected the > connector > > > requests a reconfiguration. > > > > > > In addition, files are store into a shared storage which is accessible > > from > > > each connect worker. In that way, we can distribute file streaming. > > > > > > For that prupose, it would be very convenient to have access to an > > > offsetStorageReader instance from either the Connector class or the > > > ConnectorContext class. > > > > > > I found a similar question: > > > https://www.mail-archive.com/dev@kafka.apache.org/msg50579.html > > > > > > Do you think this improvement could be considered ? I can contribute to > > it. > > > > > > Thanks, > > > > > > -- > > > Florian HUSSONNOIS > > > > > > > > > -- > Florian HUSSONNOIS >