Hi Florian,

Just curious, what 'shared storage' you guys use to keep the files before
ingested into Kafka?

In our case, we could not figure out such a nice distributed+shared file
system that is NOT HDFS alike and runs before Kafka. So we use individual
harddisks on connector machines and keep offsets/etc local on the
harddisks. That is, before Kafka Connect, everything is disconnected, and
from Kafka Connect, everything is on a distributed system.

Thanks
Tianji


On Mon, Feb 20, 2017 at 6:24 AM, Florian Hussonnois <fhussonn...@gmail.com>
wrote:

> Hi Jason,
>
> Yes, this is the idea. The connector assigns a subset of files to each
> task.
>
> A task stores the size of file, the bytes offset and the bytes size of the
> last sent record as a source offsets.
> A file is finished when recordBytesOffsets + recordBytesSize =
> fileBytesSize.
>
> The connector should be able to start a thread in background to track
> offsets for each assigned file.
> When all tasks has finished the connector can stop tasks or assigned new
> files by requesting tasks reconfiguration.
>
> Another advantage of monitoring source offsets from the connector is detect
> slow or failed tasks and if necessary to be able to restart all tasks.
>
> Thanks,
>
> 2017-02-18 6:47 GMT+01:00 Jason Gustafson <ja...@confluent.io>:
>
> > Hey Florian,
> >
> > Can you explain a bit more how having access to the offset storage from
> the
> > connector helps in your use case? I guess you are planning to use offsets
> > to be able to tell when a task has finished a file?
> >
> > Thanks,
> > Jason
> >
> > On Fri, Feb 17, 2017 at 4:45 AM, Florian Hussonnois <
> fhussonn...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kafka Team,
> > >
> > > I'm developping a connector which need to monitor the progress of its
> > tasks
> > > in order to be able to request a tasks reconfiguration in some
> > situations.
> > >
> > > Our connector is pretty simple. It's used to stream a thousands of
> files
> > > into Kafka. The connector scans directories then schedules each task
> > with a
> > > set of assigned files.
> > > When tasks are no longer required or new files are detected the
> connector
> > > requests a reconfiguration.
> > >
> > > In addition, files are store into a shared storage which is accessible
> > from
> > > each connect worker. In that way, we can distribute file streaming.
> > >
> > > For that prupose, it would be very convenient to have access to an
> > > offsetStorageReader instance from either the Connector class or the
> > > ConnectorContext class.
> > >
> > > I found a similar question:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg50579.html
> > >
> > > Do you think this improvement could be considered ? I can contribute to
> > it.
> > >
> > > Thanks,
> > >
> > > --
> > > Florian HUSSONNOIS
> > >
> >
>
>
>
> --
> Florian HUSSONNOIS
>

Reply via email to