zhouliutao created SPARK-34219:
----------------------------------
Summary: yark-cluster java.io.UTFDataFormatException
Key: SPARK-34219
URL: https://issues.apache.org/jira/browse/SPARK-34219
Project: Spark
Issue Type: Bug
Components: Spark Core, YARN
Affects Versions: 2.4.0
Reporter: zhouliutao
The spark program reports an error when writing the DataFrame to the mysql data
table, and the error information is shown in the log.
------------------------------------------------------------------------------------------------------------------
Caused by: java.io.UTFDataFormatException: encoded string too long: 87824 bytes
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at
com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:314)
at
com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:388)
at
com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:328)
at
com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:388)
at
com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:328)
at
com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:388)
at
com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:328)
at
com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:388)
at
com.typesafe.config.impl.SerializedConfigValue.writeExternal(SerializedConfigValue.java:454)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
--------------------------------------------------------------------------------------------------------------------
I tested it and found that if the "field A" field in the dataframe is removed,
it can be written to mysql normally; if "field A" is written, the same error
will be reported.
The value of field A is approximately 1000 bytes in length.
If you change the running mode of the program in app-env.sh to local and client
mode, you can write to mysql normally; use yarn and cluster mode, and report
the error in the attached log.
Is this error related to the cluster version and configuration?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]