[
https://issues.apache.org/jira/browse/IGNITE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718610#comment-16718610
]
Ray commented on IGNITE-10314:
------------------------------
[~NIzhikov]
I have implemented the refreshFields using internal API after Vladimir
confirmed in the dev list.
But when running tests in IgniteDataFrameSchemaSpec, there's some odd exception.
Exception in thread "main" java.lang.AssertionError: assertion failed: each
serializer expression should contain at least one `BoundReference`
at scala.Predef$.assert(Predef.scala:170)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:238)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$11.apply(ExpressionEncoder.scala:236)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.<init>(ExpressionEncoder.scala:236)
at
org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:63)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at
org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at
org.apache.ignite.spark.IgniteDataFrameSchemaSpec.beforeAll(IgniteDataFrameSchemaSpec.scala:122)
at
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
at
org.apache.ignite.spark.AbstractDataFrameSpec.beforeAll(AbstractDataFrameSpec.scala:39)
at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
at
org.apache.ignite.spark.AbstractDataFrameSpec.org$scalatest$BeforeAndAfter$$super$run(AbstractDataFrameSpec.scala:39)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at
org.apache.ignite.spark.AbstractDataFrameSpec.run(AbstractDataFrameSpec.scala:39)
at org.scalatest.junit.JUnitRunner.run(JUnitRunner.scala:99)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Can you take a look please?
I add breakpoint at refreshFields method, and this method is working fine, the
latest fields are in the map.
> Spark dataframe will get wrong schema if user executes add/drop column DDL
> --------------------------------------------------------------------------
>
> Key: IGNITE-10314
> URL: https://issues.apache.org/jira/browse/IGNITE-10314
> Project: Ignite
> Issue Type: Bug
> Components: spark
> Affects Versions: 2.3, 2.4, 2.5, 2.6, 2.7
> Reporter: Ray
> Assignee: Ray
> Priority: Critical
> Fix For: 2.8
>
>
> When user performs add/remove column in DDL, Spark will get the old/wrong
> schema.
>
> Analyse
> Currently Spark data frame API relies on QueryEntity to construct schema, but
> QueryEntity in QuerySchema is a local copy of the original QueryEntity, so
> the original QueryEntity is not updated when modification happens.
>
> Solution
> Get the latest schema using JDBC thin driver's column metadata call, then
> update fields in QueryEntity.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)