[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive

albertoramon (Jira) Mon, 18 Nov 2019 09:32:51 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976721#comment-16976721
 ]


albertoramon commented on ARROW-785:
------------------------------------

I saw this:(SparkSQL 2.4.4 PyArrow 0.15)

The problem is Create table with INT columns (BIGINT works properly)

SOL: Change INT to BIGINT works fine (I tried to use Double but didn't work) in 
create table

 

In my case: these Parquet Files are from SSB benchmark
{code:java}
SELECT MAX(LO_CUSTKEY), MAX(LO_PARTKEY), MAX (LO_SUPPKEY)
FROM SSB.LINEORDER;
Returns: 29999 200000 2000
{code}
 

 

In my Column_Types I Had, thus I need review my Python Code :) :
{code:java}
'lo_custkey':'int64',
 'lo_partkey':'int64',
 'lo_suppkey':'int64',{code}
 

 

 

 

> possible issue on writing parquet via pyarrow, subsequently read in Hive
> ------------------------------------------------------------------------
>
>                 Key: ARROW-785
>                 URL: https://issues.apache.org/jira/browse/ARROW-785
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Jeff Reback
>            Assignee: Wes McKinney
>            Priority: Minor
>             Fix For: 0.5.0
>
>
> details here: 
> http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f
> This round trips in pandas->parquet->pandas just fine on released pandas 
> (0.19.2) and pyarrow (0.2).
> OP stats that it is not readable in Hive however.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive

Reply via email to