[jira] [Commented] (SPARK-31463) Enhance JsonDataSource by replacing jackson with simdjson

Hyukjin Kwon (Jira) Thu, 23 Apr 2020 23:13:43 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091230#comment-17091230
 ]


Hyukjin Kwon commented on SPARK-31463:
--------------------------------------

Separate source might be ideal. We can start it from separate project and 
gradually move it into Apache Spark when it's proven very useful later.

> Enhance JsonDataSource by replacing jackson with simdjson
> ---------------------------------------------------------
>
>                 Key: SPARK-31463
>                 URL: https://issues.apache.org/jira/browse/SPARK-31463
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Steven Moy
>            Priority: Minor
>
> I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how 
> to improve json reading speed. We use Spark to process terabytes of JSON, so 
> we try to find ways to improve JSON parsing speed. 
>  
> [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/]
>  
> [https://github.com/simdjson/simdjson/issues/93]
>  
> Anyone on the opensource communty interested in leading this effort to 
> integrate simdjson in spark json data source api?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31463) Enhance JsonDataSource by replacing jackson with simdjson

Reply via email to