[ https://issues.apache.org/jira/browse/SPARK-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074445#comment-15074445 ]
Cazen Lee commented on SPARK-11745: ----------------------------------- Good Day [~rxin] This is Cazen I'm sorry for asking question, but could you let me know why ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER option has been unsupported? Recently, I created jira issue SPARK-12537 to support this, and I wonder that is there a reason to disable 3 option you mentioned Thank you in advance! > Enable more JSON parsing options for parsing non-standard JSON files > -------------------------------------------------------------------- > > Key: SPARK-11745 > URL: https://issues.apache.org/jira/browse/SPARK-11745 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > Labels: releasenotes > Fix For: 1.6.0 > > > As a user, I want to be able to read non-standard JSON files. Jackson itself > includes a few options that we should allow users to specify: > - ALLOW_COMMENTS > - ALLOW_UNQUOTED_FIELD_NAMES > - ALLOW_SINGLE_QUOTES > - ALLOW_NUMERIC_LEADING_ZEROS > - ALLOW_NON_NUMERIC_NUMBERS > After this change, the following options are still unsupported: > - ALLOW_YAML_COMMENTS > - ALLOW_UNQUOTED_CONTROL_CHARS > - ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER > See the Jackson source code pasted below for the definition of these config > options: > {code} > /** > * Feature that determines whether parser will allow use > * of Java/C++ style comments (both '/'+'*' and > * '//' varieties) within parsed content or not. > *<p> > * Since JSON specification does not mention comments as legal > * construct, > * this is a non-standard feature; however, in the wild > * this is extensively used. As such, feature is > * <b>disabled by default</b> for parsers and must be > * explicitly enabled. > */ > ALLOW_COMMENTS(false), > /** > * Feature that determines whether parser will allow use > * of YAML comments, ones starting with '#' and continuing > * until the end of the line. This commenting style is common > * with scripting languages as well. > *<p> > * Since JSON specification does not mention comments as legal > * construct, > * this is a non-standard feature. As such, feature is > * <b>disabled by default</b> for parsers and must be > * explicitly enabled. > */ > ALLOW_YAML_COMMENTS(false), > > /** > * Feature that determines whether parser will allow use > * of unquoted field names (which is allowed by Javascript, > * but not by JSON specification). > *<p> > * Since JSON specification requires use of double quotes for > * field names, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_UNQUOTED_FIELD_NAMES(false), > /** > * Feature that determines whether parser will allow use > * of single quotes (apostrophe, character '\'') for > * quoting Strings (names and String values). If so, > * this is in addition to other acceptabl markers. > * but not by JSON specification). > *<p> > * Since JSON specification requires use of double quotes for > * field names, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_SINGLE_QUOTES(false), > /** > * Feature that determines whether parser will allow > * JSON Strings to contain unquoted control characters > * (ASCII characters with value less than 32, including > * tab and line feed characters) or not. > * If feature is set false, an exception is thrown if such a > * character is encountered. > *<p> > * Since JSON specification requires quoting for all control > characters, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_UNQUOTED_CONTROL_CHARS(false), > /** > * Feature that can be enabled to accept quoting of all character > * using backslash qooting mechanism: if not enabled, only characters > * that are explicitly listed by JSON specification can be thus > * escaped (see JSON spec for small list of these characters) > *<p> > * Since JSON specification requires quoting for all control > characters, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER(false), > /** > * Feature that determines whether parser will allow > * JSON integral numbers to start with additional (ignorable) > * zeroes (like: 000001). If enabled, no exception is thrown, and > extra > * nulls are silently ignored (and not included in textual > representation > * exposed via {@link JsonParser#getText}). > *<p> > * Since JSON specification does not allow leading zeroes, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_NUMERIC_LEADING_ZEROS(false), > > /** > * Feature that allows parser to recognize set of > * "Not-a-Number" (NaN) tokens as legal floating number > * values (similar to how many other data formats and > * programming language source code allows it). > * Specific subset contains values that > * <a href="http://www.w3.org/TR/xmlschema-2/">XML Schema</a> > * (see section 3.2.4.1, Lexical Representation) > * allows (tokens are quoted contents, not including quotes): > *<ul> > * <li>"INF" (for positive infinity), as well as alias of "Infinity" > * <li>"-INF" (for negative infinity), alias "-Infinity" > * <li>"NaN" (for other not-a-numbers, like result of division by > zero) > *</ul> > *<p> > * Since JSON specification does not allow use of such values, > * this is a non-standard feature, and as such disabled by default. > */ > ALLOW_NON_NUMERIC_NUMBERS(false), > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org