[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

HyukjinKwon Thu, 19 May 2016 22:47:57 -0700

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220522607
  
    @marmbrus I tested and could produce the exceptions for reading in 
https://issues.apache.org/jira/browse/SPARK-15393 but it seems this might not 
be the reason.
    
    I tested the codes below on 
https://github.com/apache/spark/commit/c0c3ec35476c756e569a1f34c4b258eb0490585c 
(right before this PR) and master branch.
    
    ```scala
      test("SPARK-15393: create empty file") {
        withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
          withTempPath { path =>
            val schema = StructType(
              StructField("k", StringType, true) ::
              StructField("v", IntegerType, false) :: Nil)
            val emptyDf = 
spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
            emptyDf.write
              .format("parquet")
              .save(path.getCanonicalPath)
    
            val copyEmptyDf = spark.read
              .format("parquet")
              .load(path.getCanonicalPath)
    
            copyEmptyDf.show()
          }
        }
      }
    ```
    
    and it seems both produce the exceptions below:
    
    ```scala
    Unable to infer schema for ParquetFormat at 
/private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443.
 It must be specified manually;
    org.apache.spark.sql.AnalysisException: Unable to infer schema for 
ParquetFormat at 
/private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443.
 It must be specified manually;
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
        at scala.Option.getOrElse(Option.scala:121)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:323)
    ```
    
    I will try to figure out why but please feel free to revert this if you 
think my PR is problematic. I will fix the both issues together anyway later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Reply via email to