Thanks for clarifying. I will investigate your solution later for using
S3FileIO.

On Tue, Aug 17, 2021 at 11:40 AM Jack Ye <yezhao...@gmail.com> wrote:

> Good to hear the issue is fixed!
>
> ACL is optional, as the javadoc says, "If not set, ACL will not be set for
> requests".
>
> But I think to use MinIO you need to use a custom client factory to set
> your S3 endpoint as that MinIO endpoint.
>
> -Jack
>
> On Tue, Aug 17, 2021 at 11:36 AM Lian Jiang <jiangok2...@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> S3FileIO need canned ACL according to:
>>
>>   /**
>>    * Used to configure canned access control list (ACL) for S3 client to
>> use during write.
>>    * If not set, ACL will not be set for requests.
>>    * <p>
>>    * The input must be one of {@link
>> software.amazon.awssdk.services.s3.model.ObjectCannedACL},
>>    * such as 'public-read-write'
>>    * For more details:
>> https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
>>    */
>>   public static final String S3FILEIO_ACL = "s3.acl";
>>
>>
>> Minio does not support canned ACL according to
>> https://docs.min.io/docs/minio-server-limits-per-tenant.html:
>>
>> List of Amazon S3 Bucket API's not supported on MinIO
>>
>>    - BucketACL (Use bucket policies
>>    <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
>>    - BucketCORS (CORS enabled by default on all buckets for all HTTP
>>    verbs)
>>    - BucketWebsite (Use caddy <https://github.com/caddyserver/caddy> or
>>    nginx <https://www.nginx.com/resources/wiki/>)
>>    - BucketAnalytics, BucketMetrics, BucketLogging (Use bucket
>>    notification
>>    <https://docs.min.io/docs/minio-client-complete-guide#events> APIs)
>>    - BucketRequestPayment
>>
>> List of Amazon S3 Object API's not supported on MinIO
>>
>>    - ObjectACL (Use bucket policies
>>    <https://docs.min.io/docs/minio-client-complete-guide#policy> instead)
>>    - ObjectTorrent
>>
>>
>>
>> Hope this makes sense.
>>
>> BTW, iceberg + Hive + S3A works after Hive using S3A issue has been
>> fixed. Thanks Jack for helping debugging.
>>
>>
>>
>> On Tue, Aug 17, 2021 at 8:38 AM Ryan Blue <b...@tabular.io> wrote:
>>
>>> I'm not sure that I'm following why MinIO won't work with S3FileIO.
>>> S3FileIO assumes that the credentials are handled by a credentials provider
>>> outside of S3FileIO. How does MinIO handle credentials?
>>>
>>> Ryan
>>>
>>> On Mon, Aug 16, 2021 at 7:57 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Talked with Lian on Slack, the user is using a hadoop 3.2.1 + hive
>>>> (postgres) + spark + minio docker installation. There might be some S3A
>>>> related dependencies missing on the Hive server side based on the stack
>>>> trace. Let's see if that fixes the issue.
>>>> -Jack
>>>>
>>>> On Mon, Aug 16, 2021 at 7:32 PM Lian Jiang <jiangok2...@gmail.com>
>>>> wrote:
>>>>
>>>>> This is my full script launching spark-shell:
>>>>>
>>>>> # add Iceberg dependency
>>>>> export AWS_REGION=us-east-1
>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>
>>>>> ICEBERG_VERSION=0.11.1
>>>>>
>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>>>
>>>>> MINIOSERVER=192.168.176.5
>>>>>
>>>>>
>>>>> # add AWS dependnecy
>>>>> AWS_SDK_VERSION=2.15.40
>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>> AWS_PACKAGES=(
>>>>>     "bundle"
>>>>>     "url-connection-client"
>>>>> )
>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>> done
>>>>>
>>>>> # start Spark SQL client shell
>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>     --conf
>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>     --conf
>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO
>>>>>  \
>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \
>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>     --conf
>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>
>>>>>
>>>>> Let me know if anything is missing. Thanks.
>>>>>
>>>>> On Mon, Aug 16, 2021 at 7:29 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>
>>>>>> Have you included the hadoop-aws jar?
>>>>>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws
>>>>>> -Jack
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 7:09 PM Lian Jiang <jiangok2...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jack,
>>>>>>>
>>>>>>> You are right. S3FileIO will not work on minio since minio does not
>>>>>>> support ACL:
>>>>>>> https://docs.min.io/docs/minio-server-limits-per-tenant.html
>>>>>>>
>>>>>>> To use iceberg, minio + s3a, I used below script to launch
>>>>>>> spark-shell:
>>>>>>>
>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>     --conf
>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>> *    --conf
>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.hadoop.HadoopFileIO
>>>>>>> \*
>>>>>>>     --conf
>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/warehouse \
>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000 \
>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>     --conf
>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *The spark code:*
>>>>>>>
>>>>>>> import org.apache.spark.sql.SparkSession
>>>>>>> val values = List(1,2,3,4,5)
>>>>>>>
>>>>>>> val spark = SparkSession.builder().master("local").getOrCreate()
>>>>>>> import spark.implicits._
>>>>>>> val df = values.toDF()
>>>>>>>
>>>>>>> val core = "mytable"
>>>>>>> val table = s"hive_test.mydb.${core}"
>>>>>>> val s3IcePath = s"s3a://east/${core}.ice"
>>>>>>>
>>>>>>> df.writeTo(table)
>>>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>>>     .tableProperty("location", s3IcePath)
>>>>>>>     .createOrReplace()
>>>>>>>
>>>>>>>
>>>>>>> *Still the same error:*
>>>>>>> java.lang.ClassNotFoundException: Class
>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>
>>>>>>>
>>>>>>> What else could be wrong? Thanks for any clue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 16, 2021 at 9:35 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Sorry for the late reply, I thought I replied on Friday but the
>>>>>>>> email did not send successfully.
>>>>>>>>
>>>>>>>> As Daniel said, you don't need to setup S3A if you are using
>>>>>>>> S3FileIO.
>>>>>>>>
>>>>>>>> Th S3FileIO by default reads the default credentials chain to check
>>>>>>>> credential setups one by one:
>>>>>>>> https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain
>>>>>>>>
>>>>>>>> If you would like to use a specialized credential provider, you can
>>>>>>>> directly customize your S3 client:
>>>>>>>> https://iceberg.apache.org/aws/#aws-client-customization
>>>>>>>>
>>>>>>>> It looks like you are trying to use MinIO to mount S3A file system?
>>>>>>>> If you have to use MinIO then there is not a way to integrate with 
>>>>>>>> S3FileIO
>>>>>>>> right now. (maybe I am wrong on this, I don't know much about MinIO)
>>>>>>>>
>>>>>>>> To directly use S3FileIO with HiveCatalog, simply do:
>>>>>>>>
>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>     --conf
>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>     --conf
>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO 
>>>>>>>> \
>>>>>>>>     --conf spark.sql.catalog.hive_test.warehouse=s3://bucket
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jack Ye
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 15, 2021 at 2:53 PM Lian Jiang <jiangok2...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks. I prefer S3FileIO as it is recommended by iceberg. Do you
>>>>>>>>> have a sample using hive catalog, s3FileIO, spark API (as opposed to 
>>>>>>>>> SQL),
>>>>>>>>> S3 access.key and secret.key? It is hard to get all settings right 
>>>>>>>>> for this
>>>>>>>>> combination without an example. Appreciate any help.
>>>>>>>>>
>>>>>>>>> On Fri, Aug 13, 2021 at 6:01 PM Daniel Weeks <
>>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> So, if I recall correctly, the hive server does need access to
>>>>>>>>>> check and create paths for table locations.
>>>>>>>>>>
>>>>>>>>>> There may be an option to disable this behavior, but otherwise
>>>>>>>>>> the fs implementation probably needs to be available to the hive 
>>>>>>>>>> metastore.
>>>>>>>>>>
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 13, 2021, 4:48 PM Lian Jiang <jiangok2...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Daniel.
>>>>>>>>>>>
>>>>>>>>>>> After modifying the script to,
>>>>>>>>>>>
>>>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>>>
>>>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>>>
>>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION,org.apache.iceberg:iceberg-hive-runtime:$ICEBERG_VERSION,org.apache.hadoop:hadoop-aws:3.2.0"
>>>>>>>>>>>
>>>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # add AWS dependnecy
>>>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>>>     "bundle"
>>>>>>>>>>>     "url-connection-client"
>>>>>>>>>>> )
>>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>>>> done
>>>>>>>>>>>
>>>>>>>>>>> # start Spark SQL client shell
>>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog \
>>>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000
>>>>>>>>>>> \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>>>     --conf
>>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>>>
>>>>>>>>>>> I got: MetaException: java.lang.RuntimeException:
>>>>>>>>>>> java.lang.ClassNotFoundException: Class
>>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found. My hive server is 
>>>>>>>>>>> not
>>>>>>>>>>> using s3 and should not cause this error. Any ideas? Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I got "ClassNotFoundException: Class
>>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found". Any idea what 
>>>>>>>>>>> dependency
>>>>>>>>>>> could I miss?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 13, 2021 at 4:03 PM Daniel Weeks <
>>>>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Lian,
>>>>>>>>>>>>
>>>>>>>>>>>> At a cursory glance, it appears that you might be mixing two
>>>>>>>>>>>> different FileIO implementations, which may be why you are not 
>>>>>>>>>>>> getting the
>>>>>>>>>>>> expected result.
>>>>>>>>>>>>
>>>>>>>>>>>> When you set: --conf
>>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>>>  you're
>>>>>>>>>>>> actually switching over to the native S3 implementation within 
>>>>>>>>>>>> Iceberg (as
>>>>>>>>>>>> opposed to S3AFileSystem via HadoopFileIO).  However, all of the 
>>>>>>>>>>>> following
>>>>>>>>>>>> settings to setup access are then set for the S3AFileSystem (which 
>>>>>>>>>>>> would
>>>>>>>>>>>> not be used with S3FileIO).
>>>>>>>>>>>>
>>>>>>>>>>>> You might try just removing that line since it should use the
>>>>>>>>>>>> HadoopFileIO at that point and may work.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope that's helpful,
>>>>>>>>>>>> -Dan
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 13, 2021 at 3:50 PM Lian Jiang <
>>>>>>>>>>>> jiangok2...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I try to create an iceberg table on minio s3 and hive.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *This is how I launch spark-shell:*
>>>>>>>>>>>>>
>>>>>>>>>>>>> # add Iceberg dependency
>>>>>>>>>>>>> export AWS_REGION=us-east-1
>>>>>>>>>>>>> export AWS_ACCESS_KEY_ID=minio
>>>>>>>>>>>>> export AWS_SECRET_ACCESS_KEY=minio123
>>>>>>>>>>>>>
>>>>>>>>>>>>> ICEBERG_VERSION=0.11.1
>>>>>>>>>>>>>
>>>>>>>>>>>>> DEPENDENCIES="org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION"
>>>>>>>>>>>>>
>>>>>>>>>>>>> MINIOSERVER=192.168.160.5
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> # add AWS dependnecy
>>>>>>>>>>>>> AWS_SDK_VERSION=2.15.40
>>>>>>>>>>>>> AWS_MAVEN_GROUP=software.amazon.awssdk
>>>>>>>>>>>>> AWS_PACKAGES=(
>>>>>>>>>>>>>     "bundle"
>>>>>>>>>>>>>     "url-connection-client"
>>>>>>>>>>>>> )
>>>>>>>>>>>>> for pkg in "${AWS_PACKAGES[@]}"; do
>>>>>>>>>>>>>     DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
>>>>>>>>>>>>> done
>>>>>>>>>>>>>
>>>>>>>>>>>>> # start Spark SQL client shell
>>>>>>>>>>>>> /spark/bin/spark-shell --packages $DEPENDENCIES \
>>>>>>>>>>>>>     --conf
>>>>>>>>>>>>> spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog 
>>>>>>>>>>>>> \
>>>>>>>>>>>>>     --conf
>>>>>>>>>>>>> spark.sql.catalog.hive_test.warehouse=s3a://east/prefix \
>>>>>>>>>>>>>     --conf spark.sql.catalog.hive_test.type=hive  \
>>>>>>>>>>>>>     --conf
>>>>>>>>>>>>> spark.sql.catalog.hive_test.io-impl=org.apache.iceberg.aws.s3.S3FileIO
>>>>>>>>>>>>>  \
>>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.endpoint=http://$MINIOSERVER:9000
>>>>>>>>>>>>> \
>>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.access.key=minio \
>>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.secret.key=minio123 \
>>>>>>>>>>>>>     --conf spark.hadoop.fs.s3a.path.style.access=true \
>>>>>>>>>>>>>     --conf
>>>>>>>>>>>>> spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Here is the spark code to create the iceberg table:*
>>>>>>>>>>>>>
>>>>>>>>>>>>> import org.apache.spark.sql.SparkSession
>>>>>>>>>>>>> val values = List(1,2,3,4,5)
>>>>>>>>>>>>>
>>>>>>>>>>>>> val spark =
>>>>>>>>>>>>> SparkSession.builder().master("local").getOrCreate()
>>>>>>>>>>>>> import spark.implicits._
>>>>>>>>>>>>> val df = values.toDF()
>>>>>>>>>>>>>
>>>>>>>>>>>>> val core = "mytable8"
>>>>>>>>>>>>> val table = s"hive_test.mydb.${core}"
>>>>>>>>>>>>> val s3IcePath = s"s3a://spark-test/${core}.ice"
>>>>>>>>>>>>>
>>>>>>>>>>>>> df.writeTo(table)
>>>>>>>>>>>>>     .tableProperty("write.format.default", "parquet")
>>>>>>>>>>>>>     .tableProperty("location", s3IcePath)
>>>>>>>>>>>>>     .createOrReplace()
>>>>>>>>>>>>>
>>>>>>>>>>>>> I got an error "The AWS Access Key Id you provided does not
>>>>>>>>>>>>> exist in our records.".
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have verified that I can login minio UI using the same
>>>>>>>>>>>>> username and password that I passed to spark-shell via 
>>>>>>>>>>>>> AWS_ACCESS_KEY_ID
>>>>>>>>>>>>> and AWS_SECRET_ACCESS_KEY env variables.
>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/2168 is related but
>>>>>>>>>>>>> does not help me. Not sure why the credential does not work for 
>>>>>>>>>>>>> iceberg +
>>>>>>>>>>>>> AWS. Any idea or an example of writing an iceberg table to S3 
>>>>>>>>>>>>> using hive
>>>>>>>>>>>>> catalog will be highly appreciated! Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Create your own email signature
>>>>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Create your own email signature
>>>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Create your own email signature
>>>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Create your own email signature
>>>>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>>
>> --
>>
>> Create your own email signature
>> <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>
>>
>

-- 

Create your own email signature
<https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>

Reply via email to