mark pettovello created IGNITE-8165:
---------------------------------------
Summary: Spark Dataset Write intermittent "Failed to map key to
node" error
Key: IGNITE-8165
URL: https://issues.apache.org/jira/browse/IGNITE-8165
Project: Ignite
Issue Type: Bug
Components: jdbc, spark
Affects Versions: 2.4
Environment: Spark 2.1.0
Java 1.8.0_152
Ignite-core-2.4.0.jar
ignite-spark_2.10-2.4.0.jar
Scala 2.11.8
Reporter: mark pettovello
Inserts partially fail when issuing a Dataset<Row> write() operation.
Rerunning write operation causes different sets of rows fail to insert. Not
all of the rows in dsCity.show() are inserted into Ignite. All random missing
rows encountered "Failed to map key to node" exception.
SparkSession spark = SparkSession
.builder()
.appName("IgniteSQLDataSource example")
.master("local[4]")//run local PC using Winutils
.config("spark.local.dir","/tmp")
.getOrCreate();
... create about 10 \{(int) ID, (string) NAME} tuples and add them to the
dsCity dataset ...
Dataset<Row> dsCity = spark.createDataset(...).toDF("ID","NAME");
dsCity.show(1000);
String tblName = "CITY";
String jdbcURL = "jdbc:ignite:thin://127.0.0.1/";
dsCity.write()
.format("jdbc")
.option("primary_key_fields", "ID")
.option("url", jdbcURL)
.option("driver", "org.apache.ignite.IgniteJdbcThinDriver")
.option("batchsize", 1000)
.option("dbtable", tblName)
.mode(SaveMode.Append)
.save();
18/04/06 09:33:23 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 5)
java.sql.BatchUpdateException: Failed to map key to node.
at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.executeBatch(JdbcThinStatement.java:435)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:597)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)