[ https://issues.apache.org/jira/browse/SPARK-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511056#comment-16511056 ]
Tomasz Gawęda commented on SPARK-24548: --------------------------------------- IMHO names should be distinct, in other cases it's hard to query for nested field > JavaPairRDD to Dataset<Row> in SPARK generates ambiguous results > ---------------------------------------------------------------- > > Key: SPARK-24548 > URL: https://issues.apache.org/jira/browse/SPARK-24548 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core > Affects Versions: 2.3.0 > Environment: Using Windows 10, on 64bit machine with 16G of ram. > Reporter: Jackson > Priority: Major > > I have data in below JavaPairRDD : > {quote}JavaPairRDD<String,Tuple2<String,String>> MY_RDD; > {quote} > I tried using below code: > {quote}Encoder<Tuple2<String, Tuple2<String,String>>> encoder2 = > Encoders.tuple(Encoders.STRING(), > Encoders.tuple(Encoders.STRING(),Encoders.STRING())); > Dataset<Row> newDataSet = > spark.createDataset(JavaPairRDD.toRDD(MY_RDD),encoder2).toDF("value1","value2"); > newDataSet.printSchema(); > {quote} > {{root}} > {{ |-- value1: string (nullable = true)}} > {{ |-- value2: struct (nullable = true)}} > {{ | |-- value: string (nullable = true)}} > {{ | |-- value: string (nullable = true)}} > But after creating a StackOverflow question > ("https://stackoverflow.com/questions/50834145/javapairrdd-to-datasetrow-in-spark"), > i got to know that values in tuple should have distinguish field names, > where in this case its generating same name. Cause of this I cannot select > specific column under value2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org