[ 
https://issues.apache.org/jira/browse/PARQUET-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated PARQUET-1372:
-----------------------------------
    Description: The current API allows writing RowGroups with specified 
numbers of rows, however does not allow writing RowGroups with specified size. 
In order to write RowGroups of specified size we need to write rows in chunks 
while checking the total_bytes_written after each chunk is written. This is 
currently impossible because the call to NextColumn() closes the current column 
writer.  (was: The current API allows writing RowGroups with specified numbers 
of rows, however does not allow writing RowGroups with specified size in MB. In 
order to write RowGroups of specified size we need to write rows in chunks 
while checking the total_bytes_written after each chunk is written. This is 
currently impossible because the call to NextColumn() closes the current column 
writer.)

> [C++] Add an API to allow writing RowGroups based on their size rather than 
> num_rows
> ------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1372
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1372
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Anatoli Shein
>            Priority: Major
>             Fix For: 1.5.0
>
>
> The current API allows writing RowGroups with specified numbers of rows, 
> however does not allow writing RowGroups with specified size. In order to 
> write RowGroups of specified size we need to write rows in chunks while 
> checking the total_bytes_written after each chunk is written. This is 
> currently impossible because the call to NextColumn() closes the current 
> column writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to