[ https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
albertoramon updated ARROW-6129: -------------------------------- Description: Using Row_Groups to write Parquet, duplicate rows: Input: CSV 10 Rows Row_Groups=1 --> Output 10 Rows Row_Groups=2 --> Output 20 Rows !tes_output.png! Is this the expected? attached code snippet and CSV was: Using Row_Groups to write Parquet, duplicate date: Input: CSV 10 Rows Row_Groups=1 --> Output 10 Rows !tes_output.png! Row_Groups=2 --> Output 20 Rows Is this the expected? [^test01.py] > Row_groups duplicate Rows > ------------------------- > > Key: ARROW-6129 > URL: https://issues.apache.org/jira/browse/ARROW-6129 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.1 > Reporter: albertoramon > Priority: Major > Attachments: tes_output.png, test01.py, top10.csv > > > Using Row_Groups to write Parquet, duplicate rows: > Input: CSV 10 Rows > Row_Groups=1 --> Output 10 Rows > Row_Groups=2 --> Output 20 Rows > !tes_output.png! > Is this the expected? > attached code snippet and CSV -- This message was sent by Atlassian JIRA (v7.6.14#76016)