Xander-run commented on issue #6697:
URL: https://github.com/apache/gravitino/issues/6697#issuecomment-2746218039

   Thanks for the explanation, @FANNG1!
   
   I tested the spark PG catalog following these steps:
   
   1. Built spark-connector 3.4 from main branch by `./gradlew 
:spark-connector:spark-runtime-3.4:build -x test`
   2. Rebuilt a local image of `gravitino` server with a mocked [version 
value](https://github.com/apache/gravitino/blob/6b7cb02de8f4bc84528570a5e515a6790388e23c/gradle.properties#L26)
 on main branch to bypass the client version check.
   3. Copied and included PG driver and spark-connector jar files in spark 
container.
   4. Adjusted some config of `gravitino-playground`, the changes I made are 
included on [this 
branch](https://github.com/Xander-run/gravitino-playground/tree/test-spark-pg) 
and consolidated in [this 
commit](https://github.com/Xander-run/gravitino-playground/commit/b8506d79c7450c93d00fabe0a5bc1d94e8311939):
 
   
   
   This time the previous warning message about unsupported catalogs is now 
gone. However, only `spark_catalog` is accessible still:
   
   ```
   $ cd /opt/spark && /bin/bash bin/spark-sql
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   25/03/23 13:24:48 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   25/03/23 13:24:50 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
does not exist
   25/03/23 13:24:50 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
does not exist
   25/03/23 13:24:51 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 2.3.0
   25/03/23 13:24:51 WARN ObjectStore: setMetaStoreSchemaVersion called but 
recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
[email protected]
   25/03/23 13:24:51 WARN ObjectStore: Failed to get database default, 
returning NoSuchObjectException
   Spark master: local[*], Application Id: local-1742736289692
   25/03/23 13:24:52 WARN SparkSQLCLIDriver: WARNING: Directory for Hive 
history file: /home/spark does not exist.   History will not be available 
during this session.
   spark-sql (default)> SHOW CATALOGS;
   spark_catalog
   Time taken: 0.955 seconds, Fetched 1 row(s)
   spark-sql (default)> 
   ```
   
   I checked gravitino catalog API and the return value looks fine:
   
   ```
   $ curl 
http://gravitino:8090/api/metalakes/metalake_demo/catalogs?details=true 
   
{"code":0,"catalogs":[{"name":"catalog_hive","type":"relational","provider":"hive","comment":"comment","properties":{"gravitino.bypass.hive.metastore.client.capability.check":"false","metastore.uris":"thrift://hive:9083","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:54.904815883Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:54.904815883Z"}},{"name":"catalog_iceberg","type":"relational","provider":"lakehouse-iceberg","comment":"comment","properties":{"catalog-backend":"jdbc","jdbc-user":"mysql","jdbc-password":"mysql","jdbc-driver":"com.mysql.cj.jdbc.Driver","warehouse":"hdfs://hive:9000/user/iceberg/warehouse/","uri":"jdbc:mysql://mysql:3306/db","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:55.087065467Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:55.087065467Z"}},{"name":"catalog_mysql","type":"relational","provider":"jdbc-mysql","comment":"comment","properties":{"jdbc-url":
 
"jdbc:mysql://mysql:3306","jdbc-user":"mysql","jdbc-password":"mysql","jdbc-driver":"com.mysql.cj.jdbc.Driver","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:55.038994592Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:55.038994592Z"}},{"name":"catalog_postgres","type":"relational","provider":"jdbc-postgresql","comment":"comment","properties":{"jdbc-url":"jdbc:postgresql://postgresql/db","jdbc-user":"postgres","jdbc-password":"postgres","jdbc-database":"db","jdbc-driver":"org.postgresql.Driver","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:54.998116675Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:54.998116675Z"}}]}
   ```
   
   The runtime spark config looks good to me:
   ```
   spark.conf.getAll.foreach(println)
   (spark.sql.warehouse.dir,hdfs://hive:9000/user/hive/warehouse)
   (spark.executor.extraJavaOptions,-Djava.net.preferIPv6Addresses=false 
-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false)
   (spark.driver.host,ec64131c3b34)
   
(spark.sql.catalog.catalog_hive.spark.sql.hive.metastore.jars.path,file:///opt/spark/jars/*)
   (spark.driver.port,46665)
   (spark.locality.wait.node,0)
   (spark.repl.class.uri,spark://ec64131c3b34:46665/classes)
   (spark.jars,)
   (spark.sql.gravitino.enableIcebergSupport,true)
   
(spark.repl.class.outputDir,/tmp/spark-3ce77115-7dc3-4ed0-97a1-0e8bab411d95/repl-3e045793-f9af-4d98-b071-e3c43871cc52)
   (spark.sql.catalog.catalog_hive.spark.sql.hive.metastore.jars,path)
   (spark.app.name,Spark shell)
   
(spark.sql.catalog.catalog_mysql,org.apache.gravitino.spark.connector.jdbc.GravitinoJdbcCatalogSpark34)
   (spark.sql.gravitino.uri,http://gravitino:8090)
   (spark.sql.gravitino.metalake,metalake_demo)
   (spark.submit.pyFiles,)
   (spark.ui.showConsoleProgress,true)
   
(spark.sql.catalog.catalog_iceberg,org.apache.gravitino.spark.connector.iceberg.GravitinoIcebergCatalogSpark34)
   (spark.app.submitTime,1742736384837)
   
(spark.sql.catalog.catalog_postgres,org.apache.gravitino.spark.connector.jdbc.GravitinoJdbcCatalogSpark34)
   (spark.app.startTime,1742736386756)
   (spark.executor.id,driver)
   (spark.driver.extraJavaOptions,-Djava.net.preferIPv6Addresses=false 
-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false)
   (spark.submit.deployMode,client)
   (spark.sql.catalog.catalog_rest,org.apache.iceberg.spark.SparkCatalog)
   (spark.master,local[*])
   
(spark.sql.catalog.catalog_hive,org.apache.gravitino.spark.connector.hive.GravitinoHiveCatalogSpark34)
   (spark.home,/opt/spark)
   (spark.sql.catalog.catalog_rest.uri,http://gravitino:9001/iceberg/)
   (spark.sql.catalogImplementation,hive)
   
(spark.plugins,org.apache.gravitino.spark.connector.plugin.GravitinoSparkPlugin)
   (spark.sql.catalog.catalog_rest.type,rest)
   
(spark.sql.extensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions)
   (spark.app.id,local-1742736387491)
   ```
   
   The bootstrap log of `playground-spark` doesn't include too many information 
though:
   
   ```
   2025-03-23 14:20:55   % Total    % Received % Xferd  Average Speed   Time    
Time     Time  Current
   2025-03-23 14:20:55                                  Dload  Upload   Total   
Spent    Left  Speed
   2025-03-23 14:20:55 
     0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   0100-01-01 00:00:00   175  100   175    0     0  35000      0 --:--:-- 
--:--:-- --:--:-- 35000
   2025-03-23 14:20:55   % Total    % Received % Xferd  Average Speed   Time    
Time     Time  Current
   2025-03-23 14:20:55                                  Dload  Upload   Total   
Spent    Left  Speed
   2025-03-23 14:20:55 
     0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   0100-01-01 00:00:00   394  100   394    0     0  98500      0 --:--:-- 
--:--:-- --:--:-- 98500
   2025-03-23 14:20:55   % Total    % Received % Xferd  Average Speed   Time    
Time     Time  Current
   2025-03-23 14:20:55                                  Dload  Upload   Total   
Spent    Left  Speed
   2025-03-23 14:20:55 
     0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   0100-01-01 00:00:00   459  100   459    0     0   224k      0 --:--:-- 
--:--:-- --:--:--  224k
   2025-03-23 14:20:55   % Total    % Received % Xferd  Average Speed   Time    
Time     Time  Current
   2025-03-23 14:20:55                                  Dload  Upload   Total   
Spent    Left  Speed
   2025-03-23 14:20:55 
     0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   0100-01-01 00:00:00   419  100   419    0     0   136k      0 --:--:-- 
--:--:-- --:--:--  136k
   2025-03-23 14:20:55   % Total    % Received % Xferd  Average Speed   Time    
Time     Time  Current
   2025-03-23 14:20:55                                  Dload  Upload   Total   
Spent    Left  Speed
   2025-03-23 14:20:55 
     0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   0100-01-01 00:00:00   506  100   506    0     0   123k      0 --:--:-- 
--:--:-- --:--:--  123k
   ```
   
   Given that the spark PG connector is still a WIP feature, the current 
documentation for version 0.8.0 doesn't contain too many related instructions. 
Is there any missing configuration I should check to properly register the 
spark PG catalog?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to