[ 
https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

albertoramon updated ARROW-6129:
--------------------------------
    Description: 
Using Row_Groups to write Parquet, duplicate rows:

    Input: CSV 10 Rows

    Row_Groups=1 --> Output 10 Rows 

    Row_Groups=2 --> Output 20 Rows

  !tes_output.png!

Is this the expected?
attached code snippet and CSV

  was:
Using Row_Groups to write Parquet, duplicate date:

Input: CSV 10 Rows

Row_Groups=1 --> Output 10 Rows !tes_output.png!

Row_Groups=2 --> Output 20 Rows

 

Is this the expected?
[^test01.py]


> Row_groups duplicate Rows
> -------------------------
>
>                 Key: ARROW-6129
>                 URL: https://issues.apache.org/jira/browse/ARROW-6129
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>            Reporter: albertoramon
>            Priority: Major
>         Attachments: tes_output.png, test01.py, top10.csv
>
>
> Using Row_Groups to write Parquet, duplicate rows:
>     Input: CSV 10 Rows
>     Row_Groups=1 --> Output 10 Rows 
>     Row_Groups=2 --> Output 20 Rows
>   !tes_output.png!
> Is this the expected?
> attached code snippet and CSV



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to