[GitHub] [iceberg] findinpath opened a new issue #3951: Getting started with Iceberg & Spark

GitBox Fri, 21 Jan 2022 13:13:54 -0800


findinpath opened a new issue #3951:
URL: https://github.com/apache/iceberg/issues/3951



   As a newbie on Apache Iceberg universe, I am eager to try out the 
functionality exposed by the framework.
   
   It is not quite straightforward to get to setup an Icerberg environment on 
Spark.
   After downloading the spark 3.1.2 distribution, I configured 
spark-defaults.conf
   
   ```
   spark.jars.packages                    
org.apache.iceberg:iceberg-spark3-runtime:0.12.1
   spark.sql.extensions                   
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   spark.sql.catalog.demo                 org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.demo.catalog-impl    org.apache.iceberg.jdbc.JdbcCatalog
   spark.sql.catalog.demo.uri             
jdbc:postgresql://postgres:5432/demo_catalog
   spark.sql.catalog.demo.jdbc.user       admin
   spark.sql.catalog.demo.jdbc.password   password
   spark.sql.catalog.demo.io-impl         org.apache.iceberg.hadoop.HadoopFileIO
   spark.sql.catalog.demo.warehouse       /home/iceberg/warehouse
   spark.sql.defaultCatalog               demo
   ```
   
   Afterwards I did setup postgres to run on a docker container
   
   ```
   docker run --name iceberg-spark-postgres -e POSTGRES_USER=admin -e 
POSTGRES_PASSWORD=password -e POSTGRES_DB=demo_catalog -p 5432:5432 -d postgres
   ```
   
   While trying out the scenarios exposed on the page 
https://iceberg.apache.org/#maintenance/
   
   it is mentioned in the code snippets:
   
   ```
   Table table = ...
   ```
   
   Getting the Iceberg table for a Spark Catalog is not that straightforward. 
   After digging up though the Iceberg source code I stitched together this 
snippet for obtaining the table:
   
   ```
   import org.apache.spark.sql.connector.catalog.Identifier
   
   val sparkCatalog = 
spark.sessionState.catalogManager.currentCatalog.asInstanceOf[org.apache.iceberg.spark.SparkCatalog]
   
   val sparkTableTest1 = 
sparkCatalog.loadTable(Identifier.of(Array[String](""), "test1"))
   
   val icebergTableTest1 = sparkTableTest1.table
   ```
   
   
   What I'd like to have (as a newbie) on Iceberg is a Docker image / Docker 
compose to get started with Spark. Having everything packed together and ready 
to be used is much easier for a newbie to get started.
   
   For the code samples I'd very much appreciate having also the `SparkCatalog` 
in java/scala/python examples for a series of general usage scenarios that are 
not covered by SQL commands for Iceberg.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] findinpath opened a new issue #3951: Getting started with Iceberg & Spark

Reply via email to