I agree with the high level idea, and thus SPARK-15691 <https://issues.apache.org/jira/browse/SPARK-15691>.
In reality, it's a huge amount of work to create & maintain a custom catalog. It might actually make sense to do, but it just seems a lot of work to do right now and it'd take a toll on interoperability. If you don't need persistent catalog, you can just run Spark without Hive mode, can't you? On Mon, Nov 14, 2016 at 11:23 PM, assaf.mendelson <assaf.mendel...@rsa.com> wrote: > Hi, > > Today, we basically force people to use hive if they want to get the full > use of spark SQL. > > When doing the default installation this means that a derby.log and > metastore_db directory are created where we run from. > > The problem with this is that if we run multiple scripts from the same > working directory we have a problem. > > The solution we employ locally is to always run from different directory > as we ignore hive in practice (this of course means we lose the ability to > use some of the catalog options in spark session). > > The only other solution is to create a full blown hive installation with > proper configuration (probably for a JDBC solution). > > > > I would propose that in most cases there shouldn’t be any hive use at all. > Even for catalog elements such as saving a permanent table, we should be > able to configure a target directory and simply write to it (doing > everything file based to avoid the need for locking). Hive should be > reserved for those who actually use it (probably for backward > compatibility). > > > > Am I missing something here? > > Assaf. > > ------------------------------ > View this message in context: separate spark and hive > <http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. >