This is an automated email from the ASF dual-hosted git repository.
pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 98aa5ff20f0 Add blog post for Google Summer of Code 25 on improving
Apache Beam's infrastructure (#36161)
98aa5ff20f0 is described below
commit 98aa5ff20f09448551d777b78dff94e86c3b15fa
Author: Enrique Calderon <[email protected]>
AuthorDate: Tue Sep 16 12:24:08 2025 -0600
Add blog post for Google Summer of Code 25 on improving Apache Beam's
infrastructure (#36161)
* Add blog post for Google Summer of Code 25 on improving Apache Beam's
infrastructure
* Add a special thank you to my mentor and link to my report
* Add ksobrenat32 as an author
---
website/www/site/content/en/blog/gsoc-25-infra.md | 78 +++++++++++++++++++++++
website/www/site/data/authors.yml | 4 ++
2 files changed, 82 insertions(+)
diff --git a/website/www/site/content/en/blog/gsoc-25-infra.md
b/website/www/site/content/en/blog/gsoc-25-infra.md
new file mode 100644
index 00000000000..3170062fae5
--- /dev/null
+++ b/website/www/site/content/en/blog/gsoc-25-infra.md
@@ -0,0 +1,78 @@
+---
+title: "Google Summer of Code 25 - Improving Apache Beam's Infrastructure"
+date: 2025-09-15 00:00:00 -0600
+categories:
+ - blog
+ - gsoc
+aliases:
+ - /blog/2025/09/15/gsoc-25-infra.html
+authors:
+ - ksobrenat32
+
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+I loved contributing to Apache Beam during Google Summer of Code 2025. I
worked on improving the infrastructure of Apache Beam, which included enhancing
the CI/CD pipelines, automating various tasks, and improving the overall
developer experience.
+
+## Motivation
+
+Since I was in high school, I have been fascinated by computers, but when I
discovered Open Source, I was amazed by the idea of people from all around the
world collaborating to build software that anyone can use, just for the love of
it. I started participating in open source communities, and I found it to be a
great way to learn and grow as a developer.
+
+When I heard about Google Summer of Code, I saw it as an opportunity to take
my open source contributions to the next level. The idea of working on a
real-world project while being mentored by experienced developers sounded like
an amazing opportunity. I heard about Apache Beam from another contributor and
ex-GSoC participant, and I was immediately drawn to the project, specifically
on the infrastructure side of things, as I have a strong interest in DevOps and
automation.
+
+## The Challenge
+
+When searching for a project, I was told that Apache Beam's infrastructure had
several areas that could be improved. I was excited because the ideas were
focused on improving the developer experience, and creating tools that could
benefit not only Beam's developers but also the wider open source community.
+
+There were four main challenges:
+
+1. Automating the cleanup of unused cloud resources to reduce costs and
improve resource management.
+2. Implementing a system for managing permissions through Git, allowing for
better tracking and auditing of changes.
+3. Creating a tool for rotating service account keys to enhance security.
+4. Developing a security monitoring system to detect and respond to potential
threats.
+
+## The Solution
+
+I worked closely with my mentor to break down and define each challenge into
manageable tasks, creating a plan for the summer. I started by taking a look at
the current state of the infrastructure, after which I began working on each
challenge one by one.
+
+1. **Automating the cleanup of unused cloud resources:** We noticed that some
resources in the GCP project, especially Pub/Sub topics created for testing,
were often forgotten, leading to unnecessary costs. Since the infrastructure is
primarily for testing and development, there's no need to keep unused
resources. I developed a Python script that identifies and removes stale
Pub/Sub topics that have existed for too long. This tool is now scheduled to
run periodically via a GitHub Actions [...]
+
+2. **Implementing a system for managing permissions through Git:** This was
more challenging, as it required a good understanding of both GCP IAM and the
existing workflow. After some investigation, I learned that the current process
was mostly manual and error-prone. The task involved creating a more automated
and reliable system. This was achieved by using Terraform to define the desired
state of IAM roles and permissions in code, which allows for better tracking
and auditing of change [...]
+
+3. **Creating a tool for rotating service account keys:** Key rotation is a
security practice that we don't always follow, but it is essential to ensure
that service account keys are not compromised. I noticed that GCP had some APIs
that could help with this, but the rotation process itself was not automated.
So I wrote a Python script that automates the rotation of GCP service account
keys, enhancing the security of service account credentials.
+
+4. **Developing a security monitoring system:** To keep track of incorrect
usage and potential threats, I built a log analysis tool that monitors GCP
audit logs for suspicious activity, collecting and parsing logs to identify
potential security threats, delivering email alerts when something unusual is
detected.
+
+As an extra, and after noticing that some of these tools and policies could be
ignored by developers, we also came up with the idea of an enforcement module
to ensure the usage of these new tools and policies. This module would be
integrated into the CI/CD pipeline, checking for compliance with the new
infrastructure policies and notifying developers of any violations.
+
+## The Impact
+
+The tools developed during this project will have an impact on the Apache Beam
community and the wider open source community. The automation of resource
cleanup will help reduce costs and improve resource management, while the
permission management system will provide better tracking and auditing of
changes. The service account key rotation tool will enhance security, and the
security monitoring system will help detect and respond to potential threats.
+
+## Wrap Up
+
+This project has been an incredible learning experience for me. I have gained
a better understanding of how GCP works, as well as how to use Terraform and
GitHub Actions. I have also learned a lot about security best practices and how
to implement them in a real-world project.
+
+I also learned a lot about working in an open source community, having direct
communication with such experienced developers, and the importance of
collaboration and communication in a distributed team. I am grateful for the
opportunity to work on such an important project and to contribute to the
Apache Beam community.
+
+Finally, a special thanks to my mentor, Pablo Estrada, for his guidance and
support throughout the summer. I am grateful not only for his amazing technical
skills but especially for his patience and encouragement on my journey
contributing to open source.
+
+You can find my final report
[here](https://gist.github.com/ksobrenat32/b028b8303393afbe73a8fc5e17daff90) if
you want to take a look at the details of my work.
+
+## Advice for Future Participants
+
+If you are considering participating in Google Summer of Code, my advice would
be to choose an area you are passionate about; this will make any coding
challenge easier to overcome. Also, don't be afraid to ask questions and seek
help from your mentors and the community. At the start, I made that mistake,
and I learned that asking for help is a sign of strength, not weakness.
+
+Finally, make sure to manage your time effectively and stay organized (keeping
a progress journal is a great idea). GSoC is a great opportunity to learn and
grow as a developer, but it can also be time-consuming, so it's important to
stay focused and on track.
diff --git a/website/www/site/data/authors.yml
b/website/www/site/data/authors.yml
index 543c70974b4..c5b0e37050c 100644
--- a/website/www/site/data/authors.yml
+++ b/website/www/site/data/authors.yml
@@ -103,6 +103,10 @@ klk:
name: Kenneth Knowles
email: [email protected]
twitter: KennKnowles
+ksobrenat32:
+ name: Enrique Calderon
+ email: [email protected]
+ twitter:
lkuligin:
name: Leonid Kuligin
email: [email protected]