Hi Evans! On Tue, Nov 2, 2021 at 5:35 PM Evans Ye <[email protected]> wrote: > > Hi folks, > > With Bigtop 3.0 been released, I think it's time to discuss what's new as > our next steps. Of course the open source ver. of unified compatible Hadoop > Distro. is still our core product going forward. But the surrounding value > added features might be something that can take us further beyond where we > were at. Now, let me post some ideas to start the brainstorming. > > 1. Deployment on K8S: Ambari or Bigtop Puppet as K8S operators.
I am wondering how complex it is to write a Kubernetes Operator (that I assume would be a go-based application that talks with the Kubernetes API) vs writing Helm charts (or similar). We use the latter extensively at Wikimedia (but not for any Hadoop-related configs) and it works really well. Tools like Helmfile (https://github.com/roboll/helmfile) are also very nice to bootstrap and manage different environments/clusters/configurations. The couple Helm+Helmfile seems to be more close to what Bigtop currently does with puppet, so it may be an alternative (before writing an Operator) to figure out how to handle configs. For example, how is the Operator going to apply/create/etc.. configurations? I worked with Istio recently (https://istio.io/), and they offer tools that basically wrap Helm configurations (via binary client-side tool or K8s Operator) under the hood. I've never written a K8s operator so my understanding could be completely wrong! > 2. MLOps integrations: MLFlow, Submarine. At Wikimedia we are using KServe/Kubeflow, it may be a good addition to the list. We are using Openstack's Swift as object storage for models since it offers an S3 API, Apache Ozone could represent a very nice alternative (I saw some traction in the Jira, I'll try to help/review if needed!). > 3. Data Lake integrations: Hudi, Iceberg, Delta. +1, our plan is to experiment with Apache Iceberg very soon :) > And for some software engineering stuffs, I think we can do a clean up on > out-dated features such as: > 1. vagrant provisioner > 2. docker sandbox > 3. bigtop-ci > 4. bigtop-data-generators > 5. bigtop-bigpetstore Something else that would be nice: 1) Upgrade the Puppet version where needed (I know that Bigtop needs to keep compatibility with OS Distros that offer older versions of puppet etc..) 2) Migrate init.d scripts to systemd units where possible (for example, in Distros like Debian where it is fully supported). I understand that the above tasks are very complex and that require a lot of work :) They may not be super important given the above Kubernetes work to focus on, but I thought it was good to mention them! Thanks a lot for all the work! Luca
