[
https://issues.apache.org/jira/browse/PARQUET-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819644#comment-16819644
]
Qinghui Xu commented on PARQUET-1563:
-------------------------------------
So, regarding to your schema, the output is expected as per the parquet format
specification
([https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]), that
I cite here: `{{DATE`}} is used to for a logical date type, without a time of
day. It must annotate an {{int32}} that stores the number of days from the Unix
epoch, 1 January 1970.
So in your case, date `1970-04-26` is 115 days after `1970-01-01`, thus will be
stored as 115 (int32).
> cannot read 'date' datatype which write by spark
> ------------------------------------------------
>
> Key: PARQUET-1563
> URL: https://issues.apache.org/jira/browse/PARQUET-1563
> Project: Parquet
> Issue Type: Bug
> Environment: jdk: 1.8
> macOS Mojave 10.14.4
> Reporter: Fan Mo
> Priority: Major
> Attachments: Screen Shot 2019-04-16 at 2.25.54 PM.png,
> test.snappy.parquet
>
>
> I'm using spark 2.4.0 to write parquet file and try to use
> parquet-column-1.10.jar to read the data. All the primary datatypes are
> working however for the date datatype it gets some meanless number. For
> example, input date is '1970-04-26', output data is '115'. if I use Spark to
> read the data, it can get the correct date.
> following are my reader code:
> val reader = ParquetFileReader.open(HadoopInputFile.fromPath(new
> Path(("testfile.snappy.parquet")), new Configuration()))
> val schema = reader.getFooter.getFileMetaData.getSchema
> var pages : PageReadStore = null
> while((pages = reader.readNextRowGroup()) != null) {
> val rows = pages.getRowCount
> val columnIO = new ColumnIOFactory().getColumnIO(schema)
> val recordReader = columnIO.getRecordReader(pages,new
> GroupRecordConverter(schema))
> (0L until rows).foreach{ _ : Long =>
> val simpleGroup = recordReader.read()
> println(simpleGroup)
> }
> }
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)