cen yuhai created SPARK-5192:
--------------------------------

             Summary: Parquet fails to parse schemas contains '\r'
                 Key: SPARK-5192
                 URL: https://issues.apache.org/jira/browse/SPARK-5192
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.0
         Environment: windows7 + Intellj idea 13.0.2 
            Reporter: cen yuhai
            Priority: Critical
             Fix For: 1.3.0


I think this is actually a bug in parquet, when i debuged 'ParquetTestData', i 
found a exception as below. So i  download the source of MessageTypeParser, the 
funtion 'isWhitespace' do not check for '\r'

    private boolean isWhitespace(String t) {
      return t.equals(" ") || t.equals("\t") || t.equals("\n");
    }

So I replace all '\r' to work around this issue.
  val subTestSchema =
    """
      message myrecord {
      optional boolean myboolean;
      optional int64 mylong;
      }
    """.replaceAll("\r","")


at line 0: message myrecord {

        at 
parquet.schema.MessageTypeParser.asRepetition(MessageTypeParser.java:203)
        at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:101)
        at 
parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96)
        at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89)
        at 
parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79)
        at 
org.apache.spark.sql.parquet.ParquetTestData$.writeFile(ParquetTestData.scala:221)
        at 
org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:92)
        at 
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
        at 
org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:85)
        at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
        at 
org.apache.spark.sql.parquet.ParquetQuerySuite.run(ParquetQuerySuite.scala:85)
                



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to