[jira] [Commented] (PARQUET-1036) parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark

Ashima Sood (JIRA) Tue, 20 Jun 2017 12:59:33 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056370#comment-16056370
 ]


Ashima Sood commented on PARQUET-1036:
--------------------------------------

<FolderPath> is where the .parquet exists
here's the hql for the same: 
CREATE EXTERNAL TABLE IF NOT EXISTS tableName(
  DATE STRING
, ROW_ID STRING
, STATUS STRING
, GEN_TIME STRING
, GEN_MONTH STRING
..
..
..
)
STORED AS PARQUET
LOCATION 's3://${hiveconf:DATA_BUCKET}/<FolderPath>/'
tblproperties ('parquet.compress'='SNAPPY');


> parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark
> ------------------------------------------------------------------------------
>
>                 Key: PARQUET-1036
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1036
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Ashima Sood
>            Priority: Blocker
>
> using spark sql unable to read parquet file and shows null values. whereas 
> hive reads the values fine.
> 17/06/19 17:50:36 WARN CorruptStatistics: Ignoring statistics because 
> created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0
> org.apache.parquet.VersionParser$VersionParseException: Could not parse 
> created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) 
> )?\(build ?(.*)\)
>                 at 
> org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>                 at 
> org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PARQUET-1036) parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark

Reply via email to