[jira] [Created] (SPARK-10850) wholeTextFileRDD only affect the first line in each partition

Fengdong Yu (JIRA) Sun, 27 Sep 2015 20:04:33 -0700

Fengdong Yu created SPARK-10850:
-----------------------------------

             Summary: wholeTextFileRDD only affect the first line in each 
partition
                 Key: SPARK-10850
                 URL: https://issues.apache.org/jira/browse/SPARK-10850
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.4.1
            Reporter: Fengdong Yu



{code}
    val sparkConf = new SparkConf()
    val sc = new SparkContext(sparkConf)
        
    val text = sc.wholeTextFiles("/test/*/", 3)
    text.map(x => x._1 + "^^^" + x._2).collect
{code}

output:
{code}
hdfs://xxxx/test/test1/1.data^^^hello1
hello2
hello3
hdfs://xxxx/test/test2/2.data^^^hello1
hello2
hello3
{code}

I have two datasets under '/test/': /test/test1/1.data;  /test/test2/2.data

each dataset has three lines: 
hello1
hello2
hello3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-10850) wholeTextFileRDD only affect the first line in each partition

Reply via email to