[
https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-9278.
---------------------------------
Resolution: Not A Problem
I tried to reproduce the codes above.
{code}
import pandas
pdf = pandas.DataFrame({'pk': ['a']*5+['b']*5+['c']*5, 'k': ['a', 'e', 'i',
'o', 'u']*3, 'v': range(15)})
sdf = spark.createDataFrame(pdf)
sdf.filter('FALSE').write.partitionBy('pk').saveAsTable('foo',
format='parquet', path='/tmp/tmptable')
sdf.filter(sdf.pk == 'a').write.partitionBy('pk').insertInto('foo')
foo = spark.table('foo')
foo.show()
{code}
It seems now it produces an exception as below:
{code}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../spark/python/pyspark/sql/readwriter.py", line 606, in insertInto
self._jwrite.mode("overwrite" if overwrite else
"append").insertInto(tableName)
File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line
1133, in __call__
File ".../spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"insertInto() can't be used together with
partitionBy(). Partition columns have already be defined for the table. It is
not necessary to use partitionBy().;"
{code}
I am resolving this per ...
{quote}
If the issue seems clearly obsolete and applies to issues or components that
have changed radically since it was opened, resolve as Not a Problem
{quote}
Please reopen this if I was mistaken.
> DataFrameWriter.insertInto inserts incorrect data
> -------------------------------------------------
>
> Key: SPARK-9278
> URL: https://issues.apache.org/jira/browse/SPARK-9278
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Environment: Linux, S3, Hive Metastore
> Reporter: Steve Lindemann
> Assignee: Cheng Lian
> Priority: Critical
>
> After creating a partitioned Hive table (stored as Parquet) via the
> DataFrameWriter.createTable command, subsequent attempts to insert additional
> data into new partitions of this table result in inserting incorrect data
> rows. Reordering the columns in the data to be written seems to avoid this
> issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]