Re: Kafka Connect sink

Bryan Keller Tue, 03 Oct 2023 08:30:40 -0700

Thanks for the feedback. Many people use Kafka Connect today, for both loading 
data into Kafka and writing data from Kafka, so having an Iceberg sink allows 
someone to make use of their existing infrastructure to write to Iceberg. There 
are sink connectors for Delta/Databricks, Snowflake, Hudi, etc so it brings 
Iceberg up to par in that regard and allows someone to more easily switch to 
using Iceberg as well.


I'll definitely be interested in reading any proposals on data ingestion 
solutions.

-Bryan

> On Oct 2, 2023, at 11:03 PM, Jean-Baptiste Onofré <[email protected]> wrote:
> 
> From my standpoint, Kafka Connect is interesting to also address
> processing logic without Spark or Flink runtime. Definitely
> interesting to have Kafka integration/processing (even for me Kafka
> and Kafka Connect are two different things ;)).
> 
> For pure data ingestion part, I think it would make sense to have a
> "ingestion layer" in Iceberg where we can have pluggable IO and where
> we can both implement our own IO (specifically for Iceberg as Apache
> Beam IOs for instance) and where we can leverage existing integration
> framework (like Apache Camel).
> Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
> integration ? I think having such layer would be very interesting for
> the community and we can have more users (it's what happened at Apache
> Beam, the first IOs were only Google "centric" (bigtable, bigquery,
> gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
> benefit for adoption :)).
> DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)
> 
> I will do some investigation about that. I will draft a proposal.
> 
> Regards
> JB
> 
> On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <[email protected]> wrote:
>> 
>> Hi Bryan,
>> 
>> I am very happy to see this contribution.
>> I have recently tested this project with Nessie catalog and very much liked 
>> it.
>> 
>> However, I still don't know the benefits of using kafka-connect instead of 
>> directly consuming
>> from the kafka like Delta-lake's implementation.
>> https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
>> 
>> I am not an expert in this ingestion domain and recently got started.
>> I hope someone will chime in and we will have detailed analysis over the 
>> design.
>> 
>> Looking forward to this feature.
>> 
>> Thanks,
>> Ajantha
>> 
>> On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <[email protected]> 
>> wrote:
>>> 
>>> Hi Bryan
>>> 
>>> That’s a great news ! Thanks a lot for the proposal.
>>> 
>>> I will take a look on the PR and existing connector.
>>> I’m sure the Iceberg community will be very happy to see this and we will 
>>> able to add new features and improvements thanks to the community feedback.
>>> I would be more than happy to help for donation (I know that the connector 
>>> is already under Apache license but we have to double check the ICLA for 
>>> the initial contributors etc , just to be sure we are good there).
>>> 
>>> Thanks again !
>>> 
>>> Let’s see what the others are thinking.
>>> 
>>> Regards
>>> JB
>>> 
>>> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <[email protected]> a écrit :
>>>> 
>>>> Hi all,
>>>> 
>>>> We at Tabular would like to contribute our Kafka Connect Iceberg sink to 
>>>> the Iceberg project. It would be great to give Iceberg users another 
>>>> option for landing data from Kafka into Iceberg tables that is supported 
>>>> by the Iceberg community. Kafka Connect is a part of systems from AWS, 
>>>> Confluent, Redpanda, and so on, so it can make landing data from Kafka 
>>>> into Iceberg much easier for those without a Flink or Spark infrastructure.
>>>> 
>>>> There are a few Iceberg sink implementations out there for Kafka Connect, 
>>>> but we feel this one covers most of the features users have requested, 
>>>> such as exactly-once processing, schema evolution, and multi-table fanout. 
>>>> And having the sink backed by the Iceberg community will help it to evolve 
>>>> and improve over time.
>>>> 
>>>> If this sounds like something everyone would like to see added to Iceberg, 
>>>> I've opened a PR that includes some initial pieces of the sink. The 
>>>> thought was to break up the submission into parts so each could be 
>>>> reviewed more easily. Some design docs and notes can be found in the 
>>>> original repo here: 
>>>> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
>>>> 
>>>> We'd like to get feedback if others approve of moving forward with this or 
>>>> not.
>>>> 
>>>> Thanks,
>>>> Bryan
>>>>

Re: Kafka Connect sink

Reply via email to