I think you need to first call setConf and then initialize, mimicking the
logic in
https://github.com/apache/iceberg/blob/6bcca16c48cd92dc98640130a28f73431e99e336/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L189-L191which
is used by all engines to initialize catalogs. You might be able to
directly leverage the CatalogUtil.buildIcebergCatalog instead of writing
your customized logic.

With that being said, I remember we had this conversation in another thread
and did not continue with it, EMRFS consistent view is now unnecessary as
S3 is now strongly consistent. I am not sure if there is any additional
benefit you would like to gain by continuing to use EMRFS.

-Jack Ye

On Thu, Jul 8, 2021 at 8:11 AM Greg Hill <gnh...@paypal.com.invalid> wrote:

> Thanks! Seems I wasn’t too far off then. It’s my understanding that
> because we’re using EMRFS consistent view, we should not use S3FileIO or
> the emrfs metadata will get out of sync, but it doesn’t seem like this
> catalog works with HadoopFileIO so far in my basic testing. I get a
> NullPointerException because the Hadoop configuration isn’t passed along at
> some point.
>
>
>
> I noticed that I needed to call `setConf()` to get the Hadoop configs into
> the catalog object.
>
>
>
>       Map<String, String> props = ImmutableMap.of(
>
>         "type", "iceberg",
>
>         "warehouse", config.getOutputDir(),
>
>         "lock-impl", "org.apache.iceberg.aws.glue.DynamoLockManager",
>
>         "lock.table", config.getDynamoIcebergLocksTable(),
>
>         "io-impl", "org.apache.iceberg.hadoop.HadoopFileIO"
>
>       );
>
>       this.icebergCatalog.initialize("iceberg", props);
>
>
>
>
> this.icebergCatalog.setConf(spark.sparkContext().hadoopConfiguration());
>
>
>
> Then when I call createTable later:
>
>
>
> java.lang.NullPointerException
>
>                 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:481)
>
>                 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
>
>                 at org.apache.iceberg.hadoop.Util.getFs(Util.java:48)
>
>                 at
> org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53)
>
>                 at
> org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:64)
>
>                 at
> org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:137)
>
>                 at
> org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:105)
>
>                 at
> org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:118)
>
>                 at
> org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:215)
>
>                 at
> org.apache.iceberg.BaseMetastoreCatalog.createTable(BaseMetastoreCatalog.java:48)
>
>                 at
> org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:105)
>
>
>
> The NPE is because `conf` is null in that method, but I verified that
> icebergCatalog.hadoopConf is the expected object.
>
>
>
> Should it be expected that the GlueCatalog can be used with HadoopFileIO
> or is it only compatible with S3FileIO?
>
>
>
> Greg
>
>
>
>
>
> *From: *Jack Ye <yezhao...@gmail.com>
> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
> *Date: *Wednesday, July 7, 2021 at 4:16 PM
> *To: *Iceberg Dev List <dev@iceberg.apache.org>
> *Subject: *Re: GlueCatalog example?
>
>
>
> This message was identified as a phishing scam.
>
> Yeah this is actually a good point, the documentation is mostly around
> loading the catalog to different SQL engines and lacks Java API examples.
> The integration tests are good places to see Java examples:
> https://github.com/apache/iceberg/blob/master/aws/src/integration/java/org/apache/iceberg/aws/glue/GlueTestBase.java
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fblob%2Fmaster%2Faws%2Fsrc%2Fintegration%2Fjava%2Forg%2Fapache%2Ficeberg%2Faws%2Fglue%2FGlueTestBase.java&data=04%7C01%7Cgnhill%40paypal.com%7Cfc99f00ca0854b626e7208d9418c8c49%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637612894168256361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=dV9Uvdbm4ogsuvADlri%2FuWt2xAuBVA56%2BI8%2Bj3mRs1Y%3D&reserved=0>
>
>
>
> -Jack Ye
>
>
>
> On Wed, Jul 7, 2021 at 1:27 PM Greg Hill <gnh...@paypal.com.invalid>
> wrote:
>
> Is there a Java example for the proper way to get the GlueCatalog object?
> We are trying to convert from HadoopTables and need access to the
> lower-level APIs to create and update tables with partitions.
>
>
>
> I’m looking for something similar to these examples for HadoopTables and
> HiveCatalog: https://iceberg.apache.org/java-api-quickstart/
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fjava-api-quickstart%2F&data=04%7C01%7Cgnhill%40paypal.com%7Cfc99f00ca0854b626e7208d9418c8c49%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637612894168266327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=luUUropvT0UFzgyVtGjmdosqyf%2BFpRpM3oL0Pnu9tK8%3D&reserved=0>
>
>
>
> From what I can gather looking at the code, this is what I came up with
> (our catalog name is `iceberg`), but it feels like there’s probably a
> better way that I’m not seeing:
>
>
>
>       this.icebergCatalog = new GlueCatalog();
>
>       Configuration conf = spark.sparkContext().hadoopConfiguration();
>
>       Map<String, String> props = ImmutableMap.of(
>
>         "type", conf.get("spark.sql.catalog.iceberg.type"),
>
>         "warehouse", conf.get("spark.sql.catalog.iceberg.warehouse"),
>
>         "lock-impl", conf.get("spark.sql.catalog.iceberg.lock-impl"),
>
>         "lock.table", conf.get("spark.sql.catalog.iceberg.lock.table"),
>
>         "io-impl", conf.get("spark.sql.catalog.iceberg.io-impl")
>
>       );
>
>       this.icebergCatalog.initialize("iceberg", props);
>
>
>
> Sorry for the potentially n00b question, but I’m a n00b 😃
>
>
>
> Greg
>
>

Reply via email to