Spark2 create hive external table

2017-08-29 Thread antoniosi
Hi,

I am trying to connect to spark thrift server to create an external table.
In my table DDL, I have a tbl property 'spark.sql.sources.provider' =
'parquet', but I am getting an error "Cannot persistent 
 into hive metastore as table property keys may not start with 'spark.sql.':
[spark.sql.sources.provider];

However, I try to create an external table in spark-shell using
spark.catalog.createExternalTable() api. When I look at the table definition
via beeline using "show create table", I saw these tblproperties:

| TBLPROPERTIES (   

  
|
|   'COLUMN_STATS_ACCURATE'='false',

  
|
|   'numFiles'='0', 

  
|
|   'numRows'='-1', 

  
|
|   'rawDataSize'='-1', 

  
|
|   'spark.sql.sources.provider'='parquet', 

  
|
|   'spark.sql.sources.schema.numParts'='1',
   

Can someone explain why the creating the external table via jdbc to the
spark thrift server complains about the spark.sql tbl properties?

Thanks.

Antonio.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark2-create-hive-external-table-tp29118.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6

2016-06-14 Thread antoniosi
I tried the following code in both Spark 1.5.1 and Spark 1.6.0:

import org.apache.spark.sql.types.{
StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql.Row

val schema = StructType(
StructField("k", StringType, true) ::
StructField("v", IntegerType, false) :: Nil)

sqlContext.createDataFrame(sc.emptyRDD[Row], schema)
df.write.save("hdfs://xxx")

Both 1.5.1 and 1.6.0 only save _SUCCESS file. It does not save any _metadata
files. Also, in 1.6.0, it also gives the following error:

16/06/14 16:29:27 WARN ParquetOutputCommitter: could not write summary file
for hdfs://xxx
java.lang.NullPointerException
at
org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
at
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)

I do not get this exception in 1.5.1 version though.

I see this bug https://issues.apache.org/jira/browse/SPARK-15393, but this
is for Spark 2.0. Is there a same bug in Spark 1.5.1 and 1.6?

Is there a way we could save an empty dataframe properly?

Thanks.

Antonio.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-empty-Dataframes-doesn-t-save-any-metadata-files-in-Spark-1-5-1-and-1-6-tp27169.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Is Hive CREATE DATABASE IF NOT EXISTS atomic

2016-04-07 Thread antoniosi
Hi,

I am using hiveContext.sql("create database if not exists ") to
create a hive db. Is this statement atomic?

Thanks.

Antonio.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Hive-CREATE-DATABASE-IF-NOT-EXISTS-atomic-tp26706.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



create hive context in spark application

2016-03-15 Thread antoniosi
Hi,

I am trying to connect to a hive metastore deployed in a oracle db. I have
the hive configuration
specified in the hive-site.xml. I put the hive-site.xml under
$SPARK_HOME/conf. If I run spark-shell,
everything works fine. I can create hive database, tables and query the
tables.

However, when I try to do that in a spark application, running in local
mode, i.e., I have
sparkConf.setMaster("local[*]").setSparkHome(),
it does not seem
to pick up the hive-site.xml. It still uses the local derby Hive metastore
instead of the oracle
metastore that I defined in hive-site.xml. If I add the hive-site.xml
explicitly on the classpath, I am
getting the following error:

Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException:
Transaction is not active. You either need to define a transaction around
this, or run your PersistenceManagerFactory with 'NontransactionalRead' and
'NontransactionalWrite' set to 'true'
FailedObject:org.datanucleus.exceptions.TransactionNotActiveException:
Transaction is not active. You either need to define a transaction around
this, or run your PersistenceManagerFactory with 'NontransactionalRead' and
'NontransactionalWrite' set to 'true'
at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396)
at
org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186)
at
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.runTestQuery(MetaStoreDirectSql.java:204)
at
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:137)
at
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:295)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
at
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
 

This happens when I try to new a HiveContext in my code.

How do I ask Spark to look at the hive-site.xml in the $SPARK_HOME/conf
directory in my spark application?

Thanks very much. Any pointer will be much appreciated.

Regards,

Antonio.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/create-hive-context-in-spark-application-tp26496.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



TTL for saveAsObjectFile()

2015-10-13 Thread antoniosi
Hi,

I am using RDD.saveAsObjectFile() to save the RDD dataset to Tachyon. In
version 0.8, Tachyon will support for TTL for saved file. Is that supported
from Spark as well? Is there a way I could specify an TTL for a saved object
file?

Thanks.

Antonio.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/TTL-for-saveAsObjectFile-tp25051.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org