[GitHub] [iceberg] cobookman edited a comment on pull request #2963: Docs: Clarify ObjectStoreLocationProvider

GitBox Thu, 12 Aug 2021 22:02:02 -0700


cobookman edited a comment on pull request #2963:
URL: https://github.com/apache/iceberg/pull/2963#issuecomment-898187675



   > @cobookman the PR referenced is merged, could you update the documentation 
with the correct path resolution strategy? Thank you!
   
   Happy to, just want to understand what the expected write behaviour is for 
folder-storage & s3. Having the following fail on my end.
   ```
   CREATE TABLE my_catalog.my_ns.my_table (
       id bigint,
       data string,
       category string)
   USING iceberg
   OPTIONS (
       'write.object-storage.enabled'=true, 
       'write.folder-storage.path'='s3://some-bucket/some-random-folder/')
   PARTITIONED BY (category);
   
   INSERT INTO my_catalog.my_ns.my_table VALUES (1, "some data", "some 
category");
   java.lang.NullPointerException
        at 
org.apache.iceberg.LocationProviders.stripTrailingSlash(LocationProviders.java:135)
        at 
org.apache.iceberg.LocationProviders.access$000(LocationProviders.java:34)
        at 
org.apache.iceberg.LocationProviders$ObjectStoreLocationProvider.<init>(LocationProviders.java:99)
        at 
org.apache.iceberg.LocationProviders.locationsFor(LocationProviders.java:65)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.locationProvider(BaseMetastoreTableOperations.java:200)
        at org.apache.iceberg.BaseTable.locationProvider(BaseTable.java:219)
        at 
org.apache.iceberg.spark.source.SparkWrite.createWriterFactory(SparkWrite.java:172)
        at 
org.apache.iceberg.spark.source.SparkWrite.access$600(SparkWrite.java:87)
        at 
org.apache.iceberg.spark.source.SparkWrite$BaseBatchWrite.createBatchWriterFactory(SparkWrite.java:226)
   ```
   
   Omitting the `'write.object-storage.enabled'=true` avoids the NULL Pointer 
exception, but also falls back to the hive driver, writing data to:
   
`s3://some-bucket/some-random-folder/category=some+category/00000-2-13441dd2-137a-42d1-9c6b-9ccc29a2ebeb-00001.parquet`
   
   ```
   spark-sql> CREATE TABLE my_catalog.my_ns.my_table (
            >     id bigint,
            >     data string,
            >     category string)
            > USING iceberg
            > OPTIONS (
            >     
'write.folder-storage.path'='s3://some-bucket/some-random-folder/')
            > PARTITIONED BY (category);
   Time taken: 2.021 seconds
   spark-sql> INSERT INTO my_catalog.my_ns.my_table VALUES (1, "some data", 
"some category");
   Time taken: 3.968 seconds
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] cobookman edited a comment on pull request #2963: Docs: Clarify ObjectStoreLocationProvider

Reply via email to