I think you need to first call setConf and then initialize, mimicking the logic in https://github.com/apache/iceberg/blob/6bcca16c48cd92dc98640130a28f73431e99e336/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L189-L191which is used by all engines to initialize catalogs. You might be able to directly leverage the CatalogUtil.buildIcebergCatalog instead of writing your customized logic.
With that being said, I remember we had this conversation in another thread and did not continue with it, EMRFS consistent view is now unnecessary as S3 is now strongly consistent. I am not sure if there is any additional benefit you would like to gain by continuing to use EMRFS. -Jack Ye On Thu, Jul 8, 2021 at 8:11 AM Greg Hill <gnh...@paypal.com.invalid> wrote: > Thanks! Seems I wasn’t too far off then. It’s my understanding that > because we’re using EMRFS consistent view, we should not use S3FileIO or > the emrfs metadata will get out of sync, but it doesn’t seem like this > catalog works with HadoopFileIO so far in my basic testing. I get a > NullPointerException because the Hadoop configuration isn’t passed along at > some point. > > > > I noticed that I needed to call `setConf()` to get the Hadoop configs into > the catalog object. > > > > Map<String, String> props = ImmutableMap.of( > > "type", "iceberg", > > "warehouse", config.getOutputDir(), > > "lock-impl", "org.apache.iceberg.aws.glue.DynamoLockManager", > > "lock.table", config.getDynamoIcebergLocksTable(), > > "io-impl", "org.apache.iceberg.hadoop.HadoopFileIO" > > ); > > this.icebergCatalog.initialize("iceberg", props); > > > > > this.icebergCatalog.setConf(spark.sparkContext().hadoopConfiguration()); > > > > Then when I call createTable later: > > > > java.lang.NullPointerException > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:481) > > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > > at org.apache.iceberg.hadoop.Util.getFs(Util.java:48) > > at > org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53) > > at > org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:64) > > at > org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:137) > > at > org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:105) > > at > org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:118) > > at > org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:215) > > at > org.apache.iceberg.BaseMetastoreCatalog.createTable(BaseMetastoreCatalog.java:48) > > at > org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:105) > > > > The NPE is because `conf` is null in that method, but I verified that > icebergCatalog.hadoopConf is the expected object. > > > > Should it be expected that the GlueCatalog can be used with HadoopFileIO > or is it only compatible with S3FileIO? > > > > Greg > > > > > > *From: *Jack Ye <yezhao...@gmail.com> > *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> > *Date: *Wednesday, July 7, 2021 at 4:16 PM > *To: *Iceberg Dev List <dev@iceberg.apache.org> > *Subject: *Re: GlueCatalog example? > > > > This message was identified as a phishing scam. > > Yeah this is actually a good point, the documentation is mostly around > loading the catalog to different SQL engines and lacks Java API examples. > The integration tests are good places to see Java examples: > https://github.com/apache/iceberg/blob/master/aws/src/integration/java/org/apache/iceberg/aws/glue/GlueTestBase.java > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fblob%2Fmaster%2Faws%2Fsrc%2Fintegration%2Fjava%2Forg%2Fapache%2Ficeberg%2Faws%2Fglue%2FGlueTestBase.java&data=04%7C01%7Cgnhill%40paypal.com%7Cfc99f00ca0854b626e7208d9418c8c49%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637612894168256361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=dV9Uvdbm4ogsuvADlri%2FuWt2xAuBVA56%2BI8%2Bj3mRs1Y%3D&reserved=0> > > > > -Jack Ye > > > > On Wed, Jul 7, 2021 at 1:27 PM Greg Hill <gnh...@paypal.com.invalid> > wrote: > > Is there a Java example for the proper way to get the GlueCatalog object? > We are trying to convert from HadoopTables and need access to the > lower-level APIs to create and update tables with partitions. > > > > I’m looking for something similar to these examples for HadoopTables and > HiveCatalog: https://iceberg.apache.org/java-api-quickstart/ > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fjava-api-quickstart%2F&data=04%7C01%7Cgnhill%40paypal.com%7Cfc99f00ca0854b626e7208d9418c8c49%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637612894168266327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=luUUropvT0UFzgyVtGjmdosqyf%2BFpRpM3oL0Pnu9tK8%3D&reserved=0> > > > > From what I can gather looking at the code, this is what I came up with > (our catalog name is `iceberg`), but it feels like there’s probably a > better way that I’m not seeing: > > > > this.icebergCatalog = new GlueCatalog(); > > Configuration conf = spark.sparkContext().hadoopConfiguration(); > > Map<String, String> props = ImmutableMap.of( > > "type", conf.get("spark.sql.catalog.iceberg.type"), > > "warehouse", conf.get("spark.sql.catalog.iceberg.warehouse"), > > "lock-impl", conf.get("spark.sql.catalog.iceberg.lock-impl"), > > "lock.table", conf.get("spark.sql.catalog.iceberg.lock.table"), > > "io-impl", conf.get("spark.sql.catalog.iceberg.io-impl") > > ); > > this.icebergCatalog.initialize("iceberg", props); > > > > Sorry for the potentially n00b question, but I’m a n00b 😃 > > > > Greg > >