Yousof Hosny created SPARK-47892:
------------------------------------
Summary: XML: Stop ignoring CDATA within rows.
Key: SPARK-47892
URL: https://issues.apache.org/jira/browse/SPARK-47892
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny
Fix For: 4.0.0
This change ignores CDATA within row tags as well as outside of it. We should
only ignore CDATA found outside of row tags as they are considered data within
the row.
[https://github.com/apache/spark/pull/45487]
NOTE: With the current parser implementation, after not ignoring CDATA elements
within row tags there remains the edge case of a matching closing row tag
within CDATA which will be parsed as a valid end tag.
Example:
{code:java}
<row> <![CDATA[ </row> ]]> {code}
after no longer ignoring CDATA within rows, the closing tag in the example
above will be matched by the parser which is incorrect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]