[
https://issues.apache.org/jira/browse/SPARK-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-11745:
--------------------------------
Labels: releasenotes (was: )
> Enable more JSON parsing options
> --------------------------------
>
> Key: SPARK-11745
> URL: https://issues.apache.org/jira/browse/SPARK-11745
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Reynold Xin
> Assignee: Reynold Xin
> Labels: releasenotes
>
> As a user, I want to be able to read non-standard JSON files. Jackson itself
> includes a few options that we should allow users to specify:
> - ALLOW_COMMENTS
> - ALLOW_UNQUOTED_FIELD_NAMES
> - ALLOW_SINGLE_QUOTES
> - ALLOW_NUMERIC_LEADING_ZEROS
> - ALLOW_NON_NUMERIC_NUMBERS
> After this change, the following options are still unsupported:
> - ALLOW_YAML_COMMENTS
> - ALLOW_UNQUOTED_CONTROL_CHARS
> - ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER
> See the Jackson source code pasted below for the definition of these config
> options:
> {code}
> /**
> * Feature that determines whether parser will allow use
> * of Java/C++ style comments (both '/'+'*' and
> * '//' varieties) within parsed content or not.
> *<p>
> * Since JSON specification does not mention comments as legal
> * construct,
> * this is a non-standard feature; however, in the wild
> * this is extensively used. As such, feature is
> * <b>disabled by default</b> for parsers and must be
> * explicitly enabled.
> */
> ALLOW_COMMENTS(false),
> /**
> * Feature that determines whether parser will allow use
> * of YAML comments, ones starting with '#' and continuing
> * until the end of the line. This commenting style is common
> * with scripting languages as well.
> *<p>
> * Since JSON specification does not mention comments as legal
> * construct,
> * this is a non-standard feature. As such, feature is
> * <b>disabled by default</b> for parsers and must be
> * explicitly enabled.
> */
> ALLOW_YAML_COMMENTS(false),
>
> /**
> * Feature that determines whether parser will allow use
> * of unquoted field names (which is allowed by Javascript,
> * but not by JSON specification).
> *<p>
> * Since JSON specification requires use of double quotes for
> * field names,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_UNQUOTED_FIELD_NAMES(false),
> /**
> * Feature that determines whether parser will allow use
> * of single quotes (apostrophe, character '\'') for
> * quoting Strings (names and String values). If so,
> * this is in addition to other acceptabl markers.
> * but not by JSON specification).
> *<p>
> * Since JSON specification requires use of double quotes for
> * field names,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_SINGLE_QUOTES(false),
> /**
> * Feature that determines whether parser will allow
> * JSON Strings to contain unquoted control characters
> * (ASCII characters with value less than 32, including
> * tab and line feed characters) or not.
> * If feature is set false, an exception is thrown if such a
> * character is encountered.
> *<p>
> * Since JSON specification requires quoting for all control
> characters,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_UNQUOTED_CONTROL_CHARS(false),
> /**
> * Feature that can be enabled to accept quoting of all character
> * using backslash qooting mechanism: if not enabled, only characters
> * that are explicitly listed by JSON specification can be thus
> * escaped (see JSON spec for small list of these characters)
> *<p>
> * Since JSON specification requires quoting for all control
> characters,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER(false),
> /**
> * Feature that determines whether parser will allow
> * JSON integral numbers to start with additional (ignorable)
> * zeroes (like: 000001). If enabled, no exception is thrown, and
> extra
> * nulls are silently ignored (and not included in textual
> representation
> * exposed via {@link JsonParser#getText}).
> *<p>
> * Since JSON specification does not allow leading zeroes,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_NUMERIC_LEADING_ZEROS(false),
>
> /**
> * Feature that allows parser to recognize set of
> * "Not-a-Number" (NaN) tokens as legal floating number
> * values (similar to how many other data formats and
> * programming language source code allows it).
> * Specific subset contains values that
> * <a href="http://www.w3.org/TR/xmlschema-2/">XML Schema</a>
> * (see section 3.2.4.1, Lexical Representation)
> * allows (tokens are quoted contents, not including quotes):
> *<ul>
> * <li>"INF" (for positive infinity), as well as alias of "Infinity"
> * <li>"-INF" (for negative infinity), alias "-Infinity"
> * <li>"NaN" (for other not-a-numbers, like result of division by
> zero)
> *</ul>
> *<p>
> * Since JSON specification does not allow use of such values,
> * this is a non-standard feature, and as such disabled by default.
> */
> ALLOW_NON_NUMERIC_NUMBERS(false),
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]