Just a little comment that I imagine there would be massive value from an ingestion layer.
Making it easier to add more integrations will be a great benefit for the ecosystem, adoption. Concretely, FWIW, I'm evaluating Iceberg [ and alternatives ] for an enterprise adoption, and existing integrations [ both for reads from and ingesting into iceberg ] and ease-of-contributing lacking integrations are TOP of mind. On Mon, Oct 2, 2023 at 11:03 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > From my standpoint, Kafka Connect is interesting to also address > processing logic without Spark or Flink runtime. Definitely > interesting to have Kafka integration/processing (even for me Kafka > and Kafka Connect are two different things ;)). > > For pure data ingestion part, I think it would make sense to have a > "ingestion layer" in Iceberg where we can have pluggable IO and where > we can both implement our own IO (specifically for Iceberg as Apache > Beam IOs for instance) and where we can leverage existing integration > framework (like Apache Camel). > Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar > integration ? I think having such layer would be very interesting for > the community and we can have more users (it's what happened at Apache > Beam, the first IOs were only Google "centric" (bigtable, bigquery, > gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great > benefit for adoption :)). > DISCLAIMER: I've implemented IOs in Beam and components in Camel ;) > > I will do some investigation about that. I will draft a proposal. > > Regards > JB > > On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > > > > Hi Bryan, > > > > I am very happy to see this contribution. > > I have recently tested this project with Nessie catalog and very much > liked it. > > > > However, I still don't know the benefits of using kafka-connect instead > of directly consuming > > from the kafka like Delta-lake's implementation. > > https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md > > > > I am not an expert in this ingestion domain and recently got started. > > I hope someone will chime in and we will have detailed analysis over the > design. > > > > Looking forward to this feature. > > > > Thanks, > > Ajantha > > > > On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> > >> Hi Bryan > >> > >> That’s a great news ! Thanks a lot for the proposal. > >> > >> I will take a look on the PR and existing connector. > >> I’m sure the Iceberg community will be very happy to see this and we > will able to add new features and improvements thanks to the community > feedback. > >> I would be more than happy to help for donation (I know that the > connector is already under Apache license but we have to double check the > ICLA for the initial contributors etc , just to be sure we are good there). > >> > >> Thanks again ! > >> > >> Let’s see what the others are thinking. > >> > >> Regards > >> JB > >> > >> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <brya...@gmail.com> a écrit : > >>> > >>> Hi all, > >>> > >>> We at Tabular would like to contribute our Kafka Connect Iceberg sink > to the Iceberg project. It would be great to give Iceberg users another > option for landing data from Kafka into Iceberg tables that is supported by > the Iceberg community. Kafka Connect is a part of systems from AWS, > Confluent, Redpanda, and so on, so it can make landing data from Kafka into > Iceberg much easier for those without a Flink or Spark infrastructure. > >>> > >>> There are a few Iceberg sink implementations out there for Kafka > Connect, but we feel this one covers most of the features users have > requested, such as exactly-once processing, schema evolution, and > multi-table fanout. And having the sink backed by the Iceberg community > will help it to evolve and improve over time. > >>> > >>> If this sounds like something everyone would like to see added to > Iceberg, I've opened a PR that includes some initial pieces of the sink. > The thought was to break up the submission into parts so each could be > reviewed more easily. Some design docs and notes can be found in the > original repo here: > https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs > >>> > >>> We'd like to get feedback if others approve of moving forward with > this or not. > >>> > >>> Thanks, > >>> Bryan > >>> >