[
https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038294#comment-18038294
]
Steven Moy commented on SPARK-31463:
------------------------------------
Seems like this exist now: [https://github.com/simdjson/simdjson-java]
essentially a java port, "A Java version of simdjson, a high-performance JSON
parser utilizing SIMD instructions"
> Enhance JsonDataSource by replacing jackson with simdjson
> ---------------------------------------------------------
>
> Key: SPARK-31463
> URL: https://issues.apache.org/jira/browse/SPARK-31463
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Steven Moy
> Priority: Minor
>
> I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how
> to improve json reading speed. We use Spark to process terabytes of JSON, so
> we try to find ways to improve JSON parsing speed.
>
> [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/]
>
> [https://github.com/simdjson/simdjson/issues/93]
>
> Anyone on the opensource communty interested in leading this effort to
> integrate simdjson in spark json data source api?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]