[ 
https://issues.apache.org/jira/browse/PARQUET-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020637#comment-16020637
 ] 

Navya Krishnappa edited comment on PARQUET-815 at 6/7/17 4:08 PM:
------------------------------------------------------------------

Hi [~rdblue] Precision is positive only, but the scale is negative. While spark 
supports negative scale but parquet doesn't. In this case, we can not create 
parquet for such data. Please help me out in resolving this issue. 

Thank you


was (Author: navya krishnappa):
Hi [~rdblue] Precision is positive only, but the scale is negative. While spark 
supports negative scale but parquet doesn't. In this case, we can not create 
parquet for such dataset. Please help me out in resolving this issue. 

Thank you

> Unable to create parquet file for the given data
> ------------------------------------------------
>
>                 Key: PARQUET-815
>                 URL: https://issues.apache.org/jira/browse/PARQUET-815
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Navya Krishnappa
>            Assignee: Ryan Blue
>
> When i'm trying to read the below mentioned csv source file and creating an 
> parquet file from that throws an java.lang.IllegalArgumentException: Invalid 
> DECIMAL scale: -9 exception.
> The source file content is 
> Row(column name)
> 9.03E+12
> 1.19E+11
> Refer the given code used read the csv file and creating an parquet file:
> //Read the csv file
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.HEADER, "true")
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(DAWBConstant.INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "
> ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile)
> // create an parquet file
> dataset.write().parquet("//path.parquet")
> Stack trace:
> Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
> at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
> at 
> org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
> at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
> at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
> at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to