UFMurphy opened a new issue, #7570:
URL: https://github.com/apache/iceberg/issues/7570

   StackOverflow Gurus:
   
   So I am going through some of the quickstarts on iceberg using Jupyter 
Notebooks. I have a postgres catalog set up and an s3 bucket for the warehouse.
   
   I am able to create the table (I see folder in the s3 bucket), but not 
insert into it. I also do see the catalog entry in my postgres db. I will post 
code below.
   
   I get the error below, which doesn't make sense, as I am setting environment 
variables in my JupyterNotebook for my aws credentials and region.
   
   `23/05/09 09:57:34 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 5) 
(192.168.86.41 executor 0): 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load region 
from any of the providers in the chain 
software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@5558d94d:
 
[software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@483d5d05:
 Unable to load region from system settings. Region must be specified either 
via environment variable (AWS_REGION) or  system property (aws.region)., 
software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@4542969b: No 
region provided in profile: default, 
software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@6070ed64:
 Unable to contact EC2 metadata service.`
   
   I know it's something simple I am missing. So any guidance you can offer is 
appreciated.
   
   #### CODE ####
   `from pyspark.sql import SparkSession
   from pyspark.conf import SparkConf
   from pyspark.sql.functions import udf, array
   from pyspark.sql.types import StringType
   from datetime import datetime as Date
   from pyspark.sql.dataframe import DataFrame
   
   %env AWS_REGION=us-east-1
   %env AWS_ACCESS_KEY_ID=XXXXX
   %env AWS_SECRET_ACCESS_KEY=XXXXXX
   
   
   conf = (
       SparkConf()
       .setAppName('DealStampede')
       .setMaster("spark://ds2.lan:7077")
       .set("spark.executor.memory", "4g")
       .set("spark.ui.port",4042)
       #packages
       .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark3-runtime:0.13.2,software.amazon.awssdk:bundle:2.20.57,software.amazon.awssdk:url-connection-client:2.20.57,org.postgresql:postgresql:42.2.23')
       #SQL Extensions
       .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
       #Configuring Catalog    
       .set('spark.sql.catalog.iceberg', 
'org.apache.iceberg.spark.SparkCatalog')
       #.set('spark.sql.catalog.iceberg.type', 'hadoop')
       .set('spark.sql.catalog.iceberg.warehouse', 's3://ds.iceberg/warehouse')
       .set('spark.sql.catalog.iceberg.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
       #Configuring metastore
       .set('spark.sql.catalog.iceberg.catalog-impl', 
'org.apache.iceberg.jdbc.JdbcCatalog')
       .set('spark.sql.catalog.iceberg.uri', 
'jdbc:postgresql://ds3.lan:5432/iceberg')
       .set('spark.sql.catalog.iceberg.jdbc.verifyServerCertificate', 'false')
       .set('spark.sql.catalog.iceberg.jdbc.useSSL', 'false')
       .set('spark.sql.catalog.iceberg.jdbc.user', 'xxxxxx')
       .set('spark.sql.catalog.iceberg.jdbc.password', 'xxxxx')
   )    
   
         
   ## Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   data = [
   [10,'Direct Sales','Mario'],
   [12,'Direct Sales','Joe'],
   [20,'Online Sales','Sally'],
   [25,'Online Sales','Dawn'],
   ]
    
   df = spark.createDataFrame(data , ['revenue','department','boss'])
    
   df.show()
   
   print("### CREATING TABLE")
   ## Write a DataFrame as a Iceberg dataset to the Amazon S3 location.
   spark.sql("""CREATE TABLE IF NOT EXISTS iceberg.table1 (
   revenue int,
   department string,
   boss string)
   USING iceberg
   location 's3://ds.iceberg/warehouse/table1'""")
   
   print("### INSERTING INTO TABLE")
   df.writeTo("iceberg.table1").append()
   
   print("### SELECTING FROM TABLE")
   df_result = spark.read.format("iceberg").load("iceberg.table1")
   df_result.show()
   
   `
   #### CODE ####


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to