Hi Evans!

On Tue, Nov 2, 2021 at 5:35 PM Evans Ye <[email protected]> wrote:
>
> Hi folks,
>
> With Bigtop 3.0 been released, I think it's time to discuss what's new as
> our next steps. Of course the open source ver. of unified compatible Hadoop
> Distro. is still our core product going forward. But the surrounding value
> added features might be something that can take us further beyond where we
> were at. Now, let me post some ideas to start the brainstorming.
>
> 1. Deployment on K8S: Ambari or Bigtop Puppet as K8S operators.

I am wondering how complex it is to write a Kubernetes Operator (that
I assume would be a go-based application that talks with the
Kubernetes API) vs writing Helm charts (or similar). We use the latter
extensively at Wikimedia (but not for any Hadoop-related configs) and
it works really well.
Tools like Helmfile (https://github.com/roboll/helmfile) are also very
nice to bootstrap and manage different
environments/clusters/configurations. The couple Helm+Helmfile seems
to be more close to what Bigtop currently does with puppet, so it may
be an alternative (before writing an Operator) to figure out how to
handle configs.
For example, how is the Operator going to apply/create/etc..
configurations? I worked with Istio recently (https://istio.io/), and
they offer tools that basically wrap Helm configurations (via binary
client-side tool or K8s Operator) under the hood. I've never written a
K8s operator so my understanding could be completely wrong!

> 2. MLOps integrations: MLFlow, Submarine.

At Wikimedia we are using KServe/Kubeflow, it may be a good addition
to the list. We are using Openstack's Swift as object storage for
models since it offers an S3 API, Apache Ozone could represent a very
nice alternative (I saw some traction in the Jira, I'll try to
help/review if needed!).

> 3. Data Lake integrations: Hudi, Iceberg, Delta.
+1, our plan is to experiment with Apache Iceberg very soon :)

> And for some software engineering stuffs, I think we can do a clean up on
> out-dated features such as:
> 1. vagrant provisioner
> 2. docker sandbox
> 3. bigtop-ci
> 4. bigtop-data-generators
> 5. bigtop-bigpetstore

Something else that would be nice:
1) Upgrade the Puppet version where needed (I know that Bigtop needs
to keep compatibility with OS Distros that offer older versions of
puppet etc..)
2) Migrate init.d scripts to systemd units where possible (for
example, in Distros like Debian where it is fully supported).

I understand that the above tasks are very complex and that require a
lot of work :) They may not be super important given the above
Kubernetes work to focus on, but I thought it was good to mention
them!

Thanks a lot for all the work!

Luca

Reply via email to