What would be the name of the image that contains the functions runtime?

Best,
Dave

> On Mar 5, 2024, at 6:37 PM, Lari Hotari <lhot...@apache.org> wrote:
> 
> These are very welcome changes! Let's go ahead asap.
> 
> -Lari
> 
> On Wed, 6 Mar 2024 at 01:04, Matteo Merli <matteo.me...@gmail.com> wrote:
>> 
>> The docker image `pulsar-all` is a convenience image that is created on top
>> of the base `pulsar` image, including all the Pulsar IO connectors as well
>> as the tiered storage offloaders.
>> 
>> The Dockerfile for `pulsar-all` can be found here:
>> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
>> 
>> The resulting image is very big:
>> 
>> ```
>> apachepulsar/pulsar-all                   3.1.2
>> 3d1aa250bf6c   2 months ago        3.68GB
>> ```
>> 
>> This poses a challenge in many ways:
>> 1. Our CI pipeline needs to build these images and cache them across
>> different stages of the pipeline
>> 2. It takes a lot of time for release managers to build and push these
>> images to Docker Hub
>> 3. Users using this image in production see very long download times,
>> something that can affect the availability of the system (eg: more chances
>> of a 2nd broker to crash if a restart takes a very long time).
>> 4. It's very unlikely that one user will require all the connectors, most
>> likely, it would use just 2-3 of them.
>> 
>> The problem is that `pulsar-all` was introduced at a time when there were
>> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9 GB
>> total size.
>> 
>> The proposal here is to drop this image altogether. Users will be able to
>> construct their own targeted images in a very simple way:
>> 
>> ```
>> FROM apachepulsar/pulsar:latest
>> RUN mkdir -p connectors && \
>>    cd connectors && \
>>    wget
>> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
>> ```
>> 
>> 
>> 
>> ### Pulsar Functions Python Runtime
>> 
>> In order to support Python functions runtime, we have been including the
>> Pulsar base image with quite a bit of dependencies, from `pulsar-client`
>> Python SDK, to gRPC which is quite a heavy package with many transitive
>> dependencies.
>> 
>> Given that the vast majority would be using the `pulsar` base image to run
>> brokers and not python functions, it would make sense to split the Python
>> support into a different image, like `pulsar-functions-python`, which
>> extends from the base image and adds all the needed Python dependencies.
>> 
>> This way it will be very easy for users to select the appropriate image and
>> we wouldn't be carrying a big amount of useless Python dependencies to
>> users who don't need them.
>> 
>> 
>> What are people's opinions with respect to this?
>> 
>> Matteo
>> 
>> --
>> Matteo Merli
>> <matteo.me...@gmail.com>

Reply via email to