Ruslan Dautkhanov created SPARK-23554: -----------------------------------------
Summary: Hive's textinputformat.record.delimiter equivalent in Spark Key: SPARK-23554 URL: https://issues.apache.org/jira/browse/SPARK-23554 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 2.3.0, 2.2.1 Reporter: Ruslan Dautkhanov It would be great if Spark would support an option similar to Hive's {{textinputformat.record.delimiter }} in spark-csv reader. We currently have to create Hive tables to workaround this missing functionality natively in Spark. {{textinputformat.record.delimiter}} was introduced back in 2011 in map-reduce era - see MAPREDUCE-2254. As an example, one of the most common use cases for us involving {{textinputformat.record.delimiter}} is to read multiple lines of text that make up a "record". Number of actual lines per "record" is varying and so {{textinputformat.record.delimiter}} is a great solution for us to process these files natively in Hadoop/Spark (custom .map() function then actually does processing of those records), and we convert it to a dataframe.. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org