[ https://issues.apache.org/jira/browse/SPARK-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-10303: ------------------------------- Description: See attached screenshot. For a job that uses SQLContext's JSON reader, the RDD DAG visualization shows that Spark is using an inefficient version of Union which chains together many UnionRDDs as opposed to the more-efficient form of UnionRDD that accepts a larger list of child RDDs. !screenshot-1.png! was: See attached screenshot. For a job that uses SQLContext's JSON reader, the RDD DAG visualization shows that Spark is using an inefficient version of Union which chains together many UnionRDDs as opposed to the more-efficient form of UnionRDD that accepts a larger list of child RDDs. > Spark SQL JSON Reader uses inefficient form of Union operation > -------------------------------------------------------------- > > Key: SPARK-10303 > URL: https://issues.apache.org/jira/browse/SPARK-10303 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Josh Rosen > Attachments: screenshot-1.png > > > See attached screenshot. For a job that uses SQLContext's JSON reader, the > RDD DAG visualization shows that Spark is using an inefficient version of > Union which chains together many UnionRDDs as opposed to the more-efficient > form of UnionRDD that accepts a larger list of child RDDs. > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org