Xander-run commented on issue #6697: URL: https://github.com/apache/gravitino/issues/6697#issuecomment-2746218039
Thanks for the explanation, @FANNG1! I tested the spark PG catalog following these steps: 1. Built spark-connector 3.4 from main branch by `./gradlew :spark-connector:spark-runtime-3.4:build -x test` 2. Rebuilt a local image of `gravitino` server with a mocked [version value](https://github.com/apache/gravitino/blob/6b7cb02de8f4bc84528570a5e515a6790388e23c/gradle.properties#L26) on main branch to bypass the client version check. 3. Copied and included PG driver and spark-connector jar files in spark container. 4. Adjusted some config of `gravitino-playground`, the changes I made are included on [this branch](https://github.com/Xander-run/gravitino-playground/tree/test-spark-pg) and consolidated in [this commit](https://github.com/Xander-run/gravitino-playground/commit/b8506d79c7450c93d00fabe0a5bc1d94e8311939): This time the previous warning message about unsupported catalogs is now gone. However, only `spark_catalog` is accessible still: ``` $ cd /opt/spark && /bin/bash bin/spark-sql Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 25/03/23 13:24:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/03/23 13:24:50 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 25/03/23 13:24:50 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 25/03/23 13:24:51 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 25/03/23 13:24:51 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected] 25/03/23 13:24:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException Spark master: local[*], Application Id: local-1742736289692 25/03/23 13:24:52 WARN SparkSQLCLIDriver: WARNING: Directory for Hive history file: /home/spark does not exist. History will not be available during this session. spark-sql (default)> SHOW CATALOGS; spark_catalog Time taken: 0.955 seconds, Fetched 1 row(s) spark-sql (default)> ``` I checked gravitino catalog API and the return value looks fine: ``` $ curl http://gravitino:8090/api/metalakes/metalake_demo/catalogs?details=true {"code":0,"catalogs":[{"name":"catalog_hive","type":"relational","provider":"hive","comment":"comment","properties":{"gravitino.bypass.hive.metastore.client.capability.check":"false","metastore.uris":"thrift://hive:9083","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:54.904815883Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:54.904815883Z"}},{"name":"catalog_iceberg","type":"relational","provider":"lakehouse-iceberg","comment":"comment","properties":{"catalog-backend":"jdbc","jdbc-user":"mysql","jdbc-password":"mysql","jdbc-driver":"com.mysql.cj.jdbc.Driver","warehouse":"hdfs://hive:9000/user/iceberg/warehouse/","uri":"jdbc:mysql://mysql:3306/db","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:55.087065467Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:55.087065467Z"}},{"name":"catalog_mysql","type":"relational","provider":"jdbc-mysql","comment":"comment","properties":{"jdbc-url": "jdbc:mysql://mysql:3306","jdbc-user":"mysql","jdbc-password":"mysql","jdbc-driver":"com.mysql.cj.jdbc.Driver","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:55.038994592Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:55.038994592Z"}},{"name":"catalog_postgres","type":"relational","provider":"jdbc-postgresql","comment":"comment","properties":{"jdbc-url":"jdbc:postgresql://postgresql/db","jdbc-user":"postgres","jdbc-password":"postgres","jdbc-database":"db","jdbc-driver":"org.postgresql.Driver","in-use":"true"},"audit":{"creator":"anonymous","createTime":"2025-03-23T13:20:54.998116675Z","lastModifier":"anonymous","lastModifiedTime":"2025-03-23T13:20:54.998116675Z"}}]} ``` The runtime spark config looks good to me: ``` spark.conf.getAll.foreach(println) (spark.sql.warehouse.dir,hdfs://hive:9000/user/hive/warehouse) (spark.executor.extraJavaOptions,-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false) (spark.driver.host,ec64131c3b34) (spark.sql.catalog.catalog_hive.spark.sql.hive.metastore.jars.path,file:///opt/spark/jars/*) (spark.driver.port,46665) (spark.locality.wait.node,0) (spark.repl.class.uri,spark://ec64131c3b34:46665/classes) (spark.jars,) (spark.sql.gravitino.enableIcebergSupport,true) (spark.repl.class.outputDir,/tmp/spark-3ce77115-7dc3-4ed0-97a1-0e8bab411d95/repl-3e045793-f9af-4d98-b071-e3c43871cc52) (spark.sql.catalog.catalog_hive.spark.sql.hive.metastore.jars,path) (spark.app.name,Spark shell) (spark.sql.catalog.catalog_mysql,org.apache.gravitino.spark.connector.jdbc.GravitinoJdbcCatalogSpark34) (spark.sql.gravitino.uri,http://gravitino:8090) (spark.sql.gravitino.metalake,metalake_demo) (spark.submit.pyFiles,) (spark.ui.showConsoleProgress,true) (spark.sql.catalog.catalog_iceberg,org.apache.gravitino.spark.connector.iceberg.GravitinoIcebergCatalogSpark34) (spark.app.submitTime,1742736384837) (spark.sql.catalog.catalog_postgres,org.apache.gravitino.spark.connector.jdbc.GravitinoJdbcCatalogSpark34) (spark.app.startTime,1742736386756) (spark.executor.id,driver) (spark.driver.extraJavaOptions,-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false) (spark.submit.deployMode,client) (spark.sql.catalog.catalog_rest,org.apache.iceberg.spark.SparkCatalog) (spark.master,local[*]) (spark.sql.catalog.catalog_hive,org.apache.gravitino.spark.connector.hive.GravitinoHiveCatalogSpark34) (spark.home,/opt/spark) (spark.sql.catalog.catalog_rest.uri,http://gravitino:9001/iceberg/) (spark.sql.catalogImplementation,hive) (spark.plugins,org.apache.gravitino.spark.connector.plugin.GravitinoSparkPlugin) (spark.sql.catalog.catalog_rest.type,rest) (spark.sql.extensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions) (spark.app.id,local-1742736387491) ``` The bootstrap log of `playground-spark` doesn't include too many information though: ``` 2025-03-23 14:20:55 % Total % Received % Xferd Average Speed Time Time Time Current 2025-03-23 14:20:55 Dload Upload Total Spent Left Speed 2025-03-23 14:20:55 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0100-01-01 00:00:00 175 100 175 0 0 35000 0 --:--:-- --:--:-- --:--:-- 35000 2025-03-23 14:20:55 % Total % Received % Xferd Average Speed Time Time Time Current 2025-03-23 14:20:55 Dload Upload Total Spent Left Speed 2025-03-23 14:20:55 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0100-01-01 00:00:00 394 100 394 0 0 98500 0 --:--:-- --:--:-- --:--:-- 98500 2025-03-23 14:20:55 % Total % Received % Xferd Average Speed Time Time Time Current 2025-03-23 14:20:55 Dload Upload Total Spent Left Speed 2025-03-23 14:20:55 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0100-01-01 00:00:00 459 100 459 0 0 224k 0 --:--:-- --:--:-- --:--:-- 224k 2025-03-23 14:20:55 % Total % Received % Xferd Average Speed Time Time Time Current 2025-03-23 14:20:55 Dload Upload Total Spent Left Speed 2025-03-23 14:20:55 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0100-01-01 00:00:00 419 100 419 0 0 136k 0 --:--:-- --:--:-- --:--:-- 136k 2025-03-23 14:20:55 % Total % Received % Xferd Average Speed Time Time Time Current 2025-03-23 14:20:55 Dload Upload Total Spent Left Speed 2025-03-23 14:20:55 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0100-01-01 00:00:00 506 100 506 0 0 123k 0 --:--:-- --:--:-- --:--:-- 123k ``` Given that the spark PG connector is still a WIP feature, the current documentation for version 0.8.0 doesn't contain too many related instructions. Is there any missing configuration I should check to properly register the spark PG catalog? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
