Hi Florian, Just curious, what 'shared storage' you guys use to keep the files before ingested into Kafka?
In our case, we could not figure out such a nice distributed+shared file system that is NOT HDFS alike and runs before Kafka. So we use individual harddisks on connector machines and keep offsets/etc local on the harddisks. That is, before Kafka Connect, everything is disconnected, and from Kafka Connect, everything is on a distributed system. Thanks Tianji On Mon, Feb 20, 2017 at 6:24 AM, Florian Hussonnois <fhussonn...@gmail.com> wrote: > Hi Jason, > > Yes, this is the idea. The connector assigns a subset of files to each > task. > > A task stores the size of file, the bytes offset and the bytes size of the > last sent record as a source offsets. > A file is finished when recordBytesOffsets + recordBytesSize = > fileBytesSize. > > The connector should be able to start a thread in background to track > offsets for each assigned file. > When all tasks has finished the connector can stop tasks or assigned new > files by requesting tasks reconfiguration. > > Another advantage of monitoring source offsets from the connector is detect > slow or failed tasks and if necessary to be able to restart all tasks. > > Thanks, > > 2017-02-18 6:47 GMT+01:00 Jason Gustafson <ja...@confluent.io>: > > > Hey Florian, > > > > Can you explain a bit more how having access to the offset storage from > the > > connector helps in your use case? I guess you are planning to use offsets > > to be able to tell when a task has finished a file? > > > > Thanks, > > Jason > > > > On Fri, Feb 17, 2017 at 4:45 AM, Florian Hussonnois < > fhussonn...@gmail.com > > > > > wrote: > > > > > Hi Kafka Team, > > > > > > I'm developping a connector which need to monitor the progress of its > > tasks > > > in order to be able to request a tasks reconfiguration in some > > situations. > > > > > > Our connector is pretty simple. It's used to stream a thousands of > files > > > into Kafka. The connector scans directories then schedules each task > > with a > > > set of assigned files. > > > When tasks are no longer required or new files are detected the > connector > > > requests a reconfiguration. > > > > > > In addition, files are store into a shared storage which is accessible > > from > > > each connect worker. In that way, we can distribute file streaming. > > > > > > For that prupose, it would be very convenient to have access to an > > > offsetStorageReader instance from either the Connector class or the > > > ConnectorContext class. > > > > > > I found a similar question: > > > https://www.mail-archive.com/dev@kafka.apache.org/msg50579.html > > > > > > Do you think this improvement could be considered ? I can contribute to > > it. > > > > > > Thanks, > > > > > > -- > > > Florian HUSSONNOIS > > > > > > > > > -- > Florian HUSSONNOIS >