[
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305703#comment-16305703
]
Paul Rogers commented on DRILL-6035:
------------------------------------
h4. Conclusion
The net-net conclusion from all of the above is:
* A large amount of work would be needed to provide solid JSON support in Drill.
* That work may not be justified given that Parquet resolves the ambiguities
and provides better performance.
The take-away for Drill users is simple:
* If JSON is used with Drill, it must be very simple and follow Drill's JSON
format rules as explained above.
* Use a purpose-built ETL tool to convert JSON to Parquet and point Drill at
the Parquet file instead of JSON.
>From a work perspective, it may be far faster, cheaper and more effective to
>back off Drill's lavish claims for JSON than to do the work needed to achieve
>those promises.
> Specify Drill's JSON behavior
> -----------------------------
>
> Key: DRILL-6035
> URL: https://issues.apache.org/jira/browse/DRILL-6035
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Pritesh Maker
>
> Drill supports JSON as its native data format. However, experience suggests
> that Drill may have limitations in the JSON that Drill supports. This ticket
> asks to clarify Drill's expected behavior on various kinds of JSON.
> Topics to be addressed:
> * Relational vs. non-relational structures
> * JSON structures used in practice and how they map to Drill
> * Support for varying data types
> * Support for missing values, especially across files
> These topics are complex, hence the request to provide a detailed
> specifications that clarifies what Drill does and does not support (or what
> is should and should not support.)
> As noted below, the "net-net" conclusion for users is to use an ETL tool to
> convert JSON to Parquet, then allow Drill to query the Parquet.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)