Shujing Yang created SPARK-46382:
------------------------------------
Summary: XML: Capture values interspersed between elements
Key: SPARK-46382
URL: https://issues.apache.org/jira/browse/SPARK-46382
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.0.0
Reporter: Shujing Yang
In XML, elements typically consist of a name and a value, with the value
enclosed between the opening and closing tags. But XML also allows to include
arbitrary values interspersed between these elements. To address this, we
provide an option named `valueTags`, which is enabled by default, to capture
these values. Consider the following example:
```
<ROW>
<a>1</a>
value1
<b>
value2
<c>2</c>
value3
</b>
</ROW>
```
In this example, `<a>`,`<b>`, and `<c>` are named elements with their
respective values enclosed within tags. There are arbitrary values value1
value2 value3 interspersed between the elements. Please note that there can be
multiple occurrences of values in a single element (i.e. there are value2,
value3 in the element <b>)
We should parse the values between tags into the valueTags field. If there are
multiple occurrences of value tags, the value tag field will be converted to an
array type.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]