+1 I think that porting the package 'as is' into Spark is probably
worthwhile.
That's relatively easy; the code is already pretty battle-tested and not
that big and even originally came from Spark code, so is more or less
similar already.

One thing it never got was DSv2 support, which means XML reading would
still be somewhat behind other formats. (I was not able to implement it.)
This isn't a necessary goal right now, but would be possibly part of the
logic of moving it into the Spark code base.

On Fri, Jul 28, 2023 at 5:38 PM Sandip Agarwala
<sandip.agarw...@databricks.com.invalid> wrote:

> Dear Spark community,
>
> I would like to start the vote for "SPIP: XML data source support".
>
> XML is a widely used data format. An external spark-xml package (
> https://github.com/databricks/spark-xml) is available to read and write
> XML data in spark. Making spark-xml built-in will provide a better user
> experience for Spark SQL and structured streaming. The proposal is to
> inline code from the spark-xml package.
>
> SPIP link:
>
> https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
>
> JIRA:
> https://issues.apache.org/jira/browse/SPARK-44265
>
> Discussion Thread:
> https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh
>
> Please vote on the SPIP for the next 72 hours:
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because __.
>
> Thanks, Sandip
>

Reply via email to