cen yuhai created SPARK-5192:
--------------------------------
Summary: Parquet fails to parse schemas contains '\r'
Key: SPARK-5192
URL: https://issues.apache.org/jira/browse/SPARK-5192
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.2.0
Environment: windows7 + Intellj idea 13.0.2
Reporter: cen yuhai
Priority: Critical
Fix For: 1.3.0
I think this is actually a bug in parquet, when i debuged 'ParquetTestData', i
found a exception as below. So i download the source of MessageTypeParser, the
funtion 'isWhitespace' do not check for '\r'
private boolean isWhitespace(String t) {
return t.equals(" ") || t.equals("\t") || t.equals("\n");
}
So I replace all '\r' to work around this issue.
val subTestSchema =
"""
message myrecord {
optional boolean myboolean;
optional int64 mylong;
}
""".replaceAll("\r","")
at line 0: message myrecord {
at
parquet.schema.MessageTypeParser.asRepetition(MessageTypeParser.java:203)
at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:101)
at
parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96)
at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89)
at
parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79)
at
org.apache.spark.sql.parquet.ParquetTestData$.writeFile(ParquetTestData.scala:221)
at
org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:92)
at
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
at
org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:85)
at
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
at
org.apache.spark.sql.parquet.ParquetQuerySuite.run(ParquetQuerySuite.scala:85)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]