UFMurphy opened a new issue, #7570:
URL: https://github.com/apache/iceberg/issues/7570
StackOverflow Gurus:
So I am going through some of the quickstarts on iceberg using Jupyter
Notebooks. I have a postgres catalog set up and an s3 bucket for the warehouse.
I am able to create the table (I see folder in the s3 bucket), but not
insert into it. I also do see the catalog entry in my postgres db. I will post
code below.
I get the error below, which doesn't make sense, as I am setting environment
variables in my JupyterNotebook for my aws credentials and region.
`23/05/09 09:57:34 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 5)
(192.168.86.41 executor 0):
software.amazon.awssdk.core.exception.SdkClientException: Unable to load region
from any of the providers in the chain
software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@5558d94d:
[software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@483d5d05:
Unable to load region from system settings. Region must be specified either
via environment variable (AWS_REGION) or system property (aws.region).,
software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@4542969b: No
region provided in profile: default,
software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@6070ed64:
Unable to contact EC2 metadata service.`
I know it's something simple I am missing. So any guidance you can offer is
appreciated.
#### CODE ####
`from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
from pyspark.sql.functions import udf, array
from pyspark.sql.types import StringType
from datetime import datetime as Date
from pyspark.sql.dataframe import DataFrame
%env AWS_REGION=us-east-1
%env AWS_ACCESS_KEY_ID=XXXXX
%env AWS_SECRET_ACCESS_KEY=XXXXXX
conf = (
SparkConf()
.setAppName('DealStampede')
.setMaster("spark://ds2.lan:7077")
.set("spark.executor.memory", "4g")
.set("spark.ui.port",4042)
#packages
.set('spark.jars.packages',
'org.apache.iceberg:iceberg-spark3-runtime:0.13.2,software.amazon.awssdk:bundle:2.20.57,software.amazon.awssdk:url-connection-client:2.20.57,org.postgresql:postgresql:42.2.23')
#SQL Extensions
.set('spark.sql.extensions',
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
#Configuring Catalog
.set('spark.sql.catalog.iceberg',
'org.apache.iceberg.spark.SparkCatalog')
#.set('spark.sql.catalog.iceberg.type', 'hadoop')
.set('spark.sql.catalog.iceberg.warehouse', 's3://ds.iceberg/warehouse')
.set('spark.sql.catalog.iceberg.io-impl',
'org.apache.iceberg.aws.s3.S3FileIO')
#Configuring metastore
.set('spark.sql.catalog.iceberg.catalog-impl',
'org.apache.iceberg.jdbc.JdbcCatalog')
.set('spark.sql.catalog.iceberg.uri',
'jdbc:postgresql://ds3.lan:5432/iceberg')
.set('spark.sql.catalog.iceberg.jdbc.verifyServerCertificate', 'false')
.set('spark.sql.catalog.iceberg.jdbc.useSSL', 'false')
.set('spark.sql.catalog.iceberg.jdbc.user', 'xxxxxx')
.set('spark.sql.catalog.iceberg.jdbc.password', 'xxxxx')
)
## Start Spark Session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
print("Spark Running")
data = [
[10,'Direct Sales','Mario'],
[12,'Direct Sales','Joe'],
[20,'Online Sales','Sally'],
[25,'Online Sales','Dawn'],
]
df = spark.createDataFrame(data , ['revenue','department','boss'])
df.show()
print("### CREATING TABLE")
## Write a DataFrame as a Iceberg dataset to the Amazon S3 location.
spark.sql("""CREATE TABLE IF NOT EXISTS iceberg.table1 (
revenue int,
department string,
boss string)
USING iceberg
location 's3://ds.iceberg/warehouse/table1'""")
print("### INSERTING INTO TABLE")
df.writeTo("iceberg.table1").append()
print("### SELECTING FROM TABLE")
df_result = spark.read.format("iceberg").load("iceberg.table1")
df_result.show()
`
#### CODE ####
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]