Re: [PR] [#9541] feat(core): Add built-in job template framework in Gravitino [gravitino]

via GitHub Mon, 29 Dec 2025 22:04:54 -0800


Copilot commented on code in PR #9542:
URL: https://github.com/apache/gravitino/pull/9542#discussion_r2652296456



##########
maintenance/jobs/src/main/java/org/apache/gravitino/maintenance/jobs/spark/SparkPiJob.java:
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.maintenance.jobs.spark;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import org.apache.gravitino.job.JobTemplateProvider;
+import org.apache.gravitino.job.SparkJobTemplate;
+import org.apache.gravitino.maintenance.jobs.BuiltInJob;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+/**
+ * Self-contained Spark Pi program for the built-in SparkPi job template.
+ *
+ * <p>This avoids depending on the Spark examples jar so the built-in template 
can run with only the
+ * Gravitino-provided jobs artifact on the classpath.
+ */
+public class SparkPiJob implements BuiltInJob {
+
+  private static final String NAME = JobTemplateProvider.BUILTIN_NAME_PREFIX + 
"sparkpi";
+  // Bump VERSION whenever SparkPi template behavior changes 
(name/executable/class/args/configs).
+  private static final String VERSION = "v1";
+
+  @Override
+  public SparkJobTemplate jobTemplate() {
+    return SparkJobTemplate.builder()
+        .withName(NAME)
+        .withComment("Built-in SparkPi job template")
+        .withExecutable(resolveExecutable(SparkPiJob.class))
+        .withClassName(SparkPiJob.class.getName())
+        .withArguments(Collections.singletonList("{{slices}}"))
+        .withConfigs(buildSparkConfigs())
+        .withCustomFields(
+            Collections.singletonMap(JobTemplateProvider.PROPERTY_VERSION_KEY, 
VERSION))
+        .build();
+  }
+
+  public static void main(String[] args) {
+    int slices = 2;
+    if (args.length > 0) {
+      try {
+        slices = Integer.parseInt(args[0]);
+      } catch (NumberFormatException e) {
+        System.err.println("Invalid number of slices provided. Using default 
value of 2.");

Review Comment:
   The SparkPiJob main method uses System.err.println and System.out.printf for 
output. According to the Apache Gravitino coding guidelines, 
System.out.println() and System.err.println() should not be used - proper 
logging should be used instead. However, since this is a Spark job's main 
method that will be executed by Spark and slf4j is only a compileOnly 
dependency, console output may be acceptable here. Consider adding a comment 
explaining why console output is used instead of logging for this specific 
case, or ensure that slf4j-api is available at runtime for this job.



##########
clients/client-python/tests/integration/test_supports_jobs.py:
##########
@@ -163,10 +160,6 @@ def test_register_and_delete_job_template(self):
         with self.assertRaises(NoSuchJobTemplateException):
             self._metalake.get_job_template(template2.name)
 

Review Comment:
   The assertion checking that the template list is empty after deleting both 
templates was removed. This test no longer verifies the final cleanup state. 
Consider keeping this assertion but filtering to count only user templates 
(excluding built-in templates), or add a comment explaining why this 
verification was removed.



##########
clients/client-python/tests/integration/test_supports_jobs.py:
##########
@@ -127,7 +126,6 @@ def test_register_and_delete_job_template(self):
 
         # List registered job templates
         registered_templates = self._metalake.list_job_templates()
-        self.assertEqual(len(registered_templates), 2)
         self.assertIn(template1, registered_templates)
         self.assertIn(template2, registered_templates)

Review Comment:
   The test assertions were changed from checking exact counts to just checking 
containment. This makes the tests less precise - they no longer verify that 
exactly 2 templates exist, only that the expected templates are present. 
Consider verifying that only the expected user templates exist (filtering out 
built-in templates if necessary), or add a comment explaining that built-in 
templates may also be present.



##########
clients/client-python/tests/integration/test_supports_jobs.py:
##########
@@ -95,7 +95,6 @@ def test_register_and_list_job_templates(self):
 
         # List registered job templates
         registered_templates = self._metalake.list_job_templates()
-        self.assertEqual(len(registered_templates), 2)
         self.assertIn(template_1, registered_templates)
         self.assertIn(template_2, registered_templates)

Review Comment:
   The test assertions were changed from checking exact counts to just checking 
containment. While this accommodates the presence of built-in templates, the 
tests no longer verify that only the expected templates exist. Consider adding 
assertions that verify the expected templates are present AND that no 
unexpected user templates exist, or at minimum add a comment explaining that 
built-in templates may also be present in the list.



##########
maintenance/jobs/src/main/java/org/apache/gravitino/maintenance/jobs/spark/SparkPiJob.java:
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.maintenance.jobs.spark;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import org.apache.gravitino.job.JobTemplateProvider;
+import org.apache.gravitino.job.SparkJobTemplate;
+import org.apache.gravitino.maintenance.jobs.BuiltInJob;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+/**
+ * Self-contained Spark Pi program for the built-in SparkPi job template.
+ *
+ * <p>This avoids depending on the Spark examples jar so the built-in template 
can run with only the
+ * Gravitino-provided jobs artifact on the classpath.
+ */
+public class SparkPiJob implements BuiltInJob {
+
+  private static final String NAME = JobTemplateProvider.BUILTIN_NAME_PREFIX + 
"sparkpi";
+  // Bump VERSION whenever SparkPi template behavior changes 
(name/executable/class/args/configs).
+  private static final String VERSION = "v1";
+
+  @Override
+  public SparkJobTemplate jobTemplate() {
+    return SparkJobTemplate.builder()
+        .withName(NAME)
+        .withComment("Built-in SparkPi job template")
+        .withExecutable(resolveExecutable(SparkPiJob.class))
+        .withClassName(SparkPiJob.class.getName())
+        .withArguments(Collections.singletonList("{{slices}}"))
+        .withConfigs(buildSparkConfigs())
+        .withCustomFields(
+            Collections.singletonMap(JobTemplateProvider.PROPERTY_VERSION_KEY, 
VERSION))
+        .build();
+  }
+
+  public static void main(String[] args) {
+    int slices = 2;
+    if (args.length > 0) {
+      try {
+        slices = Integer.parseInt(args[0]);
+      } catch (NumberFormatException e) {
+        System.err.println("Invalid number of slices provided. Using default 
value of 2.");
+      }
+    }
+
+    int samples = Math.max(slices, 1) * 100000;
+
+    SparkSession spark =
+        SparkSession.builder()
+            .appName("Gravitino Built-in SparkPi")
+            // Rely on external cluster/master configuration
+            .getOrCreate();
+
+    JavaSparkContext jsc = 
JavaSparkContext.fromSparkContext(spark.sparkContext());
+    JavaRDD<Integer> rdd =
+        jsc.parallelize(IntStream.range(0, 
samples).boxed().collect(Collectors.toList()), slices);
+    long count =
+        rdd.filter(
+                i -> {
+                  double x = Math.random() * 2 - 1;
+                  double y = Math.random() * 2 - 1;
+                  return x * x + y * y <= 1;
+                })
+            .count();
+
+    double pi = 4.0 * count / samples;

Review Comment:
   The main method uses System.out.printf to print the Pi calculation result. 
According to the Apache Gravitino coding guidelines, System.out.println() 
should not be used - proper logging is recommended. However, since this is a 
Spark application's main method that will be executed in a Spark cluster 
environment where SLF4J is only a compileOnly dependency, console output may be 
intentional. Consider adding a comment explaining that this is console output 
for Spark's execution environment rather than application logging.
   ```suggestion
       double pi = 4.0 * count / samples;
       // Intentionally use stdout so the Pi result appears in Spark driver 
logs; this is
       // console output for the Spark execution environment, not application 
logging.
   ```



##########
clients/client-java/src/test/java/org/apache/gravitino/client/integration/test/JobIT.java:
##########
@@ -172,10 +169,6 @@ public void testRegisterAndDeleteJobTemplate() {
     // Verify the second job template is deleted
     Assertions.assertThrows(
         NoSuchJobTemplateException.class, () -> 
metalake.getJobTemplate(template2.name()));

Review Comment:
   The test assertion checking that the list is empty after deleting both 
templates was removed. This weakens the test - it no longer verifies the final 
cleanup state. Consider keeping this assertion but filtering to count only user 
templates (excluding built-in templates), or add a comment explaining why this 
verification was removed.



##########
clients/client-java/src/test/java/org/apache/gravitino/client/integration/test/authorization/JobAuthorizationIT.java:
##########
@@ -117,11 +117,14 @@ public void testListJobTemplates() {
     // Normal user can see job templates they own (test_1, test_2)
     List<JobTemplate> normalUserTemplates =
         normalUserClient.loadMetalake(METALAKE).listJobTemplates();
-    Assertions.assertEquals(2, normalUserTemplates.size());
+    Assertions.assertTrue(normalUserTemplates.stream().anyMatch(s -> 
s.name().equals("test_1")));
+    Assertions.assertTrue(normalUserTemplates.stream().anyMatch(s -> 
s.name().equals("test_2")));
 
     // Admin can see all job templates (test_1, test_2, test_3)
     List<JobTemplate> adminTemplates = 
client.loadMetalake(METALAKE).listJobTemplates();
-    Assertions.assertEquals(3, adminTemplates.size());
+    Assertions.assertTrue(adminTemplates.stream().anyMatch(s -> 
s.name().equals("test_1")));
+    Assertions.assertTrue(adminTemplates.stream().anyMatch(s -> 
s.name().equals("test_2")));
+    Assertions.assertTrue(adminTemplates.stream().anyMatch(s -> 
s.name().equals("test_3")));

Review Comment:
   The test assertions were changed from exact count checks to containment 
checks. This reduces test precision - the tests no longer verify that exactly 
the expected number of templates exist. Consider using assertions that count 
only user templates (filtering out built-in templates) to maintain the same 
level of verification, or add comments explaining that built-in templates may 
also be present.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#9541] feat(core): Add built-in job template framework in Gravitino [gravitino]

Reply via email to