[ https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37176: ------------------------------------ Assignee: Apache Spark > JsonSource's infer should have the same exception handle logic as > JacksonParser's parse logic > --------------------------------------------------------------------------------------------- > > Key: SPARK-37176 > URL: https://issues.apache.org/jira/browse/SPARK-37176 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.3, 3.1.2, 3.2.0 > Reporter: Xianjin YE > Assignee: Apache Spark > Priority: Minor > > JacksonParser's exception handle logic is different with > org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different > can be saw as below: > {code:java} > // code JacksonParser's parse > try { > Utils.tryWithResource(createParser(factory, record)) { parser => > // a null first token is equivalent to testing for input.trim.isEmpty > // but it works on any token stream and not just strings > parser.nextToken() match { > case null => None > case _ => rootConverter.apply(parser) match { > case null => throw > QueryExecutionErrors.rootConverterReturnNullError() > case rows => rows.toSeq > } > } > } > } catch { > case e: SparkUpgradeException => throw e > case e @ (_: RuntimeException | _: JsonProcessingException | _: > MalformedInputException) => > // JSON parser currently doesn't support partial results for > corrupted records. > // For such records, all fields other than the field configured by > // `columnNameOfCorruptRecord` are set to `null`. > throw BadRecordException(() => recordLiteral(record), () => None, e) > case e: CharConversionException if options.encoding.isEmpty => > val msg = > """JSON parser cannot handle a character in its input. > |Specifying encoding as an input option explicitly might help to > resolve the issue. > |""".stripMargin + e.getMessage > val wrappedCharException = new CharConversionException(msg) > wrappedCharException.initCause(e) > throw BadRecordException(() => recordLiteral(record), () => None, > wrappedCharException) > case PartialResultException(row, cause) => > throw BadRecordException( > record = () => recordLiteral(record), > partialResult = () => Some(row), > cause) > } > {code} > v.s. > {code:java} > // JsonInferSchema's infer logic > val mergedTypesFromPartitions = json.mapPartitions { iter => > val factory = options.buildJsonFactory() > iter.flatMap { row => > try { > Utils.tryWithResource(createParser(factory, row)) { parser => > parser.nextToken() > Some(inferField(parser)) > } > } catch { > case e @ (_: RuntimeException | _: JsonProcessingException) => > parseMode match { > case PermissiveMode => > Some(StructType(Seq(StructField(columnNameOfCorruptRecord, > StringType)))) > case DropMalformedMode => > None > case FailFastMode => > throw > QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e) > } > } > }.reduceOption(typeMerger).toIterator > } > {code} > They should have the same exception handle logic, otherwise it may confuse > user because of the inconsistency. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org