Spark2 create hive external table
Hi, I am trying to connect to spark thrift server to create an external table. In my table DDL, I have a tbl property 'spark.sql.sources.provider' = 'parquet', but I am getting an error "Cannot persistent into hive metastore as table property keys may not start with 'spark.sql.': [spark.sql.sources.provider]; However, I try to create an external table in spark-shell using spark.catalog.createExternalTable() api. When I look at the table definition via beeline using "show create table", I saw these tblproperties: | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='false', | | 'numFiles'='0', | | 'numRows'='-1', | | 'rawDataSize'='-1', | | 'spark.sql.sources.provider'='parquet', | | 'spark.sql.sources.schema.numParts'='1', Can someone explain why the creating the external table via jdbc to the spark thrift server complains about the spark.sql tbl properties? Thanks. Antonio. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark2-create-hive-external-table-tp29118.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6
I tried the following code in both Spark 1.5.1 and Spark 1.6.0: import org.apache.spark.sql.types.{ StructType, StructField, StringType, IntegerType} import org.apache.spark.sql.Row val schema = StructType( StructField("k", StringType, true) :: StructField("v", IntegerType, false) :: Nil) sqlContext.createDataFrame(sc.emptyRDD[Row], schema) df.write.save("hdfs://xxx") Both 1.5.1 and 1.6.0 only save _SUCCESS file. It does not save any _metadata files. Also, in 1.6.0, it also gives the following error: 16/06/14 16:29:27 WARN ParquetOutputCommitter: could not write summary file for hdfs://xxx java.lang.NullPointerException at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456) at org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420) at org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) I do not get this exception in 1.5.1 version though. I see this bug https://issues.apache.org/jira/browse/SPARK-15393, but this is for Spark 2.0. Is there a same bug in Spark 1.5.1 and 1.6? Is there a way we could save an empty dataframe properly? Thanks. Antonio. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-empty-Dataframes-doesn-t-save-any-metadata-files-in-Spark-1-5-1-and-1-6-tp27169.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Is Hive CREATE DATABASE IF NOT EXISTS atomic
Hi, I am using hiveContext.sql("create database if not exists ") to create a hive db. Is this statement atomic? Thanks. Antonio. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Hive-CREATE-DATABASE-IF-NOT-EXISTS-atomic-tp26706.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
create hive context in spark application
Hi, I am trying to connect to a hive metastore deployed in a oracle db. I have the hive configuration specified in the hive-site.xml. I put the hive-site.xml under $SPARK_HOME/conf. If I run spark-shell, everything works fine. I can create hive database, tables and query the tables. However, when I try to do that in a spark application, running in local mode, i.e., I have sparkConf.setMaster("local[*]").setSparkHome(), it does not seem to pick up the hive-site.xml. It still uses the local derby Hive metastore instead of the oracle metastore that I defined in hive-site.xml. If I add the hive-site.xml explicitly on the classpath, I am getting the following error: Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' FailedObject:org.datanucleus.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396) at org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.runTestQuery(MetaStoreDirectSql.java:204) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:137) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:295) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) This happens when I try to new a HiveContext in my code. How do I ask Spark to look at the hive-site.xml in the $SPARK_HOME/conf directory in my spark application? Thanks very much. Any pointer will be much appreciated. Regards, Antonio. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/create-hive-context-in-spark-application-tp26496.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
TTL for saveAsObjectFile()
Hi, I am using RDD.saveAsObjectFile() to save the RDD dataset to Tachyon. In version 0.8, Tachyon will support for TTL for saved file. Is that supported from Spark as well? Is there a way I could specify an TTL for a saved object file? Thanks. Antonio. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TTL-for-saveAsObjectFile-tp25051.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org