Hi there,

I am a potentially new contributor, so don't spend too much time on me.
However I would like to give this a try. Reason is that  it would be a nice
to have at my work (the connection between glue and spark). We run our own
spark clusters and don't use EMR and right now our spark jobs can't benefit
from the glue metastore. This is not a huge problem, because we keep strict
naming conventions and use orc, but still it would be nice for our user
base.

As you can guess, our cluster runs on AWS and I have a good amount of
experience with the aws SDK's, reasonable amount with Scala. I am however a
beginner with Spark, never contributed before.

As far as I can see I need to implement ExternelCatalog for Glue and glue
seems to support all operations specified in the trait. Even the user
defined functions, which surprised me, because Athena doesn't support this.

I can see some obstacles, e.g. how to deal with permissions. Therefore I
will study the hive ExternalCatalog. Can I take that as leading example?

I also saw there was prior work from the mailing list (
http://apache-spark-developers-list.1001551.n3.nabble.com/A-new-external-catalog-td23394.html),
but unfortunately there is no code.

Would this be a suitable project to pick up? I thought it might be, because
it is kinda on the edge of Spark.

Thanks for your time in advance!

Greets,

Edgar Klerks

Reply via email to