Jackson created SPARK-24548:
-------------------------------

             Summary: JavaPairRDD to Dataset<Row> in SPARK returns error
                 Key: SPARK-24548
                 URL: https://issues.apache.org/jira/browse/SPARK-24548
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.0
         Environment: Using Windows 10, on 64bit machine with 16G of ram.
            Reporter: Jackson


I have data in below JavaPairRDD :
{quote}JavaPairRDD<String,Tuple2<String,String>> MY_RDD;
{quote}
I tried using below code:
{quote}Encoder<Tuple2<String, Tuple2<String,String>>> encoder2 =
Encoders.tuple(Encoders.STRING(), 
Encoders.tuple(Encoders.STRING(),Encoders.STRING()));
Dataset<Row> newDataSet = 
spark.createDataset(JavaPairRDD.toRDD(MY_RDD),encoder2).toDF("value1","value2");

newDataSet.printSchema();
{quote}
{{root}}
{{ |-- value1: string (nullable = true)}}
{{ |-- value2: struct (nullable = true)}}
{{ | |-- value: string (nullable = true)}}
{{ | |-- value: string (nullable = true)}}

But after creating a StackOverflow question 
("https://stackoverflow.com/questions/50834145/javapairrdd-to-datasetrow-in-spark";),
 i got to know that values in tuple should have distinguish field names, where 
in this case its generating same name. Cause of this I cannot select specific 
column under value2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to