What would be the name of the image that contains the functions runtime? Best, Dave
> On Mar 5, 2024, at 6:37 PM, Lari Hotari <lhot...@apache.org> wrote: > > These are very welcome changes! Let's go ahead asap. > > -Lari > > On Wed, 6 Mar 2024 at 01:04, Matteo Merli <matteo.me...@gmail.com> wrote: >> >> The docker image `pulsar-all` is a convenience image that is created on top >> of the base `pulsar` image, including all the Pulsar IO connectors as well >> as the tiered storage offloaders. >> >> The Dockerfile for `pulsar-all` can be found here: >> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile >> >> The resulting image is very big: >> >> ``` >> apachepulsar/pulsar-all 3.1.2 >> 3d1aa250bf6c 2 months ago 3.68GB >> ``` >> >> This poses a challenge in many ways: >> 1. Our CI pipeline needs to build these images and cache them across >> different stages of the pipeline >> 2. It takes a lot of time for release managers to build and push these >> images to Docker Hub >> 3. Users using this image in production see very long download times, >> something that can affect the availability of the system (eg: more chances >> of a 2nd broker to crash if a restart takes a very long time). >> 4. It's very unlikely that one user will require all the connectors, most >> likely, it would use just 2-3 of them. >> >> The problem is that `pulsar-all` was introduced at a time when there were >> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9 GB >> total size. >> >> The proposal here is to drop this image altogether. Users will be able to >> construct their own targeted images in a very simple way: >> >> ``` >> FROM apachepulsar/pulsar:latest >> RUN mkdir -p connectors && \ >> cd connectors && \ >> wget >> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar >> ``` >> >> >> >> ### Pulsar Functions Python Runtime >> >> In order to support Python functions runtime, we have been including the >> Pulsar base image with quite a bit of dependencies, from `pulsar-client` >> Python SDK, to gRPC which is quite a heavy package with many transitive >> dependencies. >> >> Given that the vast majority would be using the `pulsar` base image to run >> brokers and not python functions, it would make sense to split the Python >> support into a different image, like `pulsar-functions-python`, which >> extends from the base image and adds all the needed Python dependencies. >> >> This way it will be very easy for users to select the appropriate image and >> we wouldn't be carrying a big amount of useless Python dependencies to >> users who don't need them. >> >> >> What are people's opinions with respect to this? >> >> Matteo >> >> -- >> Matteo Merli >> <matteo.me...@gmail.com>