[jira] [Commented] (PARQUET-1036) parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark

Wes McKinney (JIRA) Tue, 20 Jun 2017 13:01:54 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056372#comment-16056372
 ]


Wes McKinney commented on PARQUET-1036:
---------------------------------------

The version of Spark you are using does not support spaces in column names. You 
need to normalize the column names on the Python side. So this isn't an Arrow 
or Parquet bug I don't think

> parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark
> ------------------------------------------------------------------------------
>
>                 Key: PARQUET-1036
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1036
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Ashima Sood
>            Priority: Blocker
>
> using spark sql unable to read parquet file and shows null values. whereas 
> hive reads the values fine.
> 17/06/19 17:50:36 WARN CorruptStatistics: Ignoring statistics because 
> created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0
> org.apache.parquet.VersionParser$VersionParseException: Could not parse 
> created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) 
> )?\(build ?(.*)\)
>                 at 
> org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>                 at 
> org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PARQUET-1036) parquet file created via pyarrow 0.4.0 ; version 1.0 - incompatible with Spark

Reply via email to