[jira] [Created] (SPARK-9714) Cannot insert into a table using pySpark

Yun Park (JIRA) Thu, 06 Aug 2015 16:14:47 -0700

Yun Park created SPARK-9714:
-------------------------------

             Summary: Cannot insert into a table using pySpark
                 Key: SPARK-9714
                 URL: https://issues.apache.org/jira/browse/SPARK-9714
             Project: Spark
          Issue Type: Bug
            Reporter: Yun Park
            Priority: Critical



This is a bug on the master branch. After creating the table ("yun" is the 
table name) with the corresponding fields, I ran the following command.

from pyspark.sql import *
sc.parallelize([Row(id=1, name="test", 
description="")]).toDF().write.mode("append").saveAsTable("yun")

I get the following error:

Py4JJavaError: An error occurred while calling o100.saveAsTable.
: org.apache.spark.SparkException: Task not serializable

Caused by: java.io.NotSerializableException: org.apache.hadoop.fs.Path
Serialization stack:
        - object not serializable (class: org.apache.hadoop.fs.Path, value: 
dbfs:/user/hive/warehouse/yun)
        - field (class: org.apache.hadoop.hive.ql.metadata.Table, name: path, 
type: class org.apache.hadoop.fs.Path)
        - object (class org.apache.hadoop.hive.ql.metadata.Table, yun)
        - field (class: org.apache.hadoop.hive.ql.metadata.Partition, name: 
table, type: class org.apache.hadoop.hive.ql.metadata.Table)
        - object (class org.apache.hadoop.hive.ql.metadata.Partition, yun())
        - field (class: scala.collection.immutable.Stream$Cons, name: hd, type: 
class java.lang.Object)
        - object (class scala.collection.immutable.Stream$Cons, Stream(yun()))
        - field (class: scala.collection.immutable.Stream$$anonfun$map$1, name: 
$outer, type: class scala.collection.immutable.Stream)
        - object (class scala.collection.immutable.Stream$$anonfun$map$1, 
<function0>)
        - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
interface scala.Function0)
        - object (class scala.collection.immutable.Stream$Cons, 
Stream(HivePartition(List(),HiveStorageDescriptor(dbfs:/user/hive/warehouse/yun,org.apache.hadoop.mapred.TextInputFormat,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,Map(serialization.format
 -> 1)))))
        - field (class: scala.collection.immutable.Stream$$anonfun$map$1, name: 
$outer, type: class scala.collection.immutable.Stream)
        - object (class scala.collection.immutable.Stream$$anonfun$map$1, 
<function0>)
        - field (class: scala.collection.immutable.Stream$Cons, name: tl, type: 
interface scala.Function0)
        - object (class scala.collection.immutable.Stream$Cons, 
Stream(dbfs:/user/hive/warehouse/yun))
        - field (class: org.apache.spark.sql.hive.MetastoreRelation, name: 
paths, type: interface scala.collection.Seq)
        - object (class org.apache.spark.sql.hive.MetastoreRelation, 
MetastoreRelation default, yun, None
)
        - field (class: 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable, name: table, type: 
class org.apache.spark.sql.hive.MetastoreRelation)
        - object (class 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable, InsertIntoHiveTable 
(MetastoreRelation default, yun, None), Map(), false, false
 ConvertToSafe
  TungstenProject [CAST(description#10, FloatType) AS 
description#16,CAST(id#11L, StringType) AS id#17,name#12]
   PhysicalRDD [description#10,id#11L,name#12], MapPartitionsRDD[17] at 
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
)
        - field (class: 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3,
 name: $outer, type: class 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable)
        - object (class 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3,
 <function2>)
        at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
        at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
        at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
        at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
        ... 30 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-9714) Cannot insert into a table using pySpark

Reply via email to