[jira] [Created] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark

Ruslan Dautkhanov (JIRA) Thu, 01 Mar 2018 12:58:45 -0800

Ruslan Dautkhanov created SPARK-23554:
-----------------------------------------


             Summary: Hive's textinputformat.record.delimiter equivalent in 
Spark
                 Key: SPARK-23554
                 URL: https://issues.apache.org/jira/browse/SPARK-23554
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 2.3.0, 2.2.1
            Reporter: Ruslan Dautkhanov


It would be great if Spark would support an option similar to Hive's 
{{textinputformat.record.delimiter }} in spark-csv reader.

We currently have to create Hive tables to workaround this missing 
functionality natively in Spark.

{{textinputformat.record.delimiter}} was introduced back in 2011 in map-reduce 
era -
 see MAPREDUCE-2254.

As an example, one of the most common use cases for us involving 
{{textinputformat.record.delimiter}} is to read multiple lines of text that 
make up a "record". Number of actual lines per "record" is varying and so 
{{textinputformat.record.delimiter}} is a great solution for us to process 
these files natively in Hadoop/Spark (custom .map() function then actually does 
processing of those records), and we convert it to a dataframe.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark

Reply via email to