[ https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496522#comment-14496522 ]
Yin Huai commented on SPARK-6217: --------------------------------- [~cpcloud] Right now, we do not support inserting into a table created from a Python collection. So, in your code, sdf and sdf2 are read only table because they were created from existing Python collections. If you want to insert into data into a table, you can first create a table based on a data source that supports insert (for example, Parquet data source) and use {{sdf.saveAsTable("sdf", "parquet")}}. Then, you can insert data into the {{sdf}} table. For now, we should provide a better error message when a user want to insert data into a read only table. For long term, I think it will be useful to support insert for tables created from collections of a program language. > insertInto doesn't work in PySpark > ---------------------------------- > > Key: SPARK-6217 > URL: https://issues.apache.org/jira/browse/SPARK-6217 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.3.0 > Environment: Mac OS X Yosemite 10.10.2 > Python 2.7.9 > Spark 1.3.0 > Reporter: Charles Cloud > > The following code, running in an IPython shell throws an error: > {code:none} > In [1]: from pyspark import SparkContext, HiveContext > In [2]: sc = SparkContext('local[*]', 'test') > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > In [3]: sql = HiveContext(sc) > In [4]: import pandas as pd > In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': > list('abc')}) > In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': > list('def')}) > In [7]: sdf = sql.createDataFrame(df) > In [8]: sdf2 = sql.createDataFrame(df2) > In [9]: sql.registerDataFrameAsTable(sdf, 'sdf') > In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2') > In [11]: sql.cacheTable('sdf') > In [12]: sql.cacheTable('sdf2') > In [13]: sdf2.insertInto('sdf') # throws an error > {code} > Here's the Java traceback: > {code:none} > Py4JJavaError: An error occurred while calling o270.insertInto. > : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable > (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at > SQLContext.scala:1167), Map(), false > InMemoryRelation [a#6,b#7L,c#8], true, 10000, StorageLevel(true, true, > false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at > mapPartitions at SQLContext.scala:1167), Some(sdf2) > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092) > at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > {code} > I'd be ecstatic if this was my own fault, and I'm somehow using it > incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org