[jira] [Commented] (SPARK-24548) JavaPairRDD to Dataset in SPARK generates ambiguous results

JIRA Wed, 13 Jun 2018 05:41:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511056#comment-16511056
 ]


Tomasz Gawęda commented on SPARK-24548:
---------------------------------------

IMHO names should be distinct, in other cases it's hard to query for nested 
field

> JavaPairRDD to Dataset<Row> in SPARK generates ambiguous results
> ----------------------------------------------------------------
>
>                 Key: SPARK-24548
>                 URL: https://issues.apache.org/jira/browse/SPARK-24548
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API, Spark Core
>    Affects Versions: 2.3.0
>         Environment: Using Windows 10, on 64bit machine with 16G of ram.
>            Reporter: Jackson
>            Priority: Major
>
> I have data in below JavaPairRDD :
> {quote}JavaPairRDD<String,Tuple2<String,String>> MY_RDD;
> {quote}
> I tried using below code:
> {quote}Encoder<Tuple2<String, Tuple2<String,String>>> encoder2 =
> Encoders.tuple(Encoders.STRING(), 
> Encoders.tuple(Encoders.STRING(),Encoders.STRING()));
> Dataset<Row> newDataSet = 
> spark.createDataset(JavaPairRDD.toRDD(MY_RDD),encoder2).toDF("value1","value2");
> newDataSet.printSchema();
> {quote}
> {{root}}
> {{ |-- value1: string (nullable = true)}}
> {{ |-- value2: struct (nullable = true)}}
> {{ | |-- value: string (nullable = true)}}
> {{ | |-- value: string (nullable = true)}}
> But after creating a StackOverflow question 
> ("https://stackoverflow.com/questions/50834145/javapairrdd-to-datasetrow-in-spark";),
>  i got to know that values in tuple should have distinguish field names, 
> where in this case its generating same name. Cause of this I cannot select 
> specific column under value2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24548) JavaPairRDD to Dataset in SPARK generates ambiguous results

Reply via email to