[ 
https://issues.apache.org/jira/browse/PARQUET-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333379#comment-17333379
 ] 

Gabor Szadovszky commented on PARQUET-2026:
-------------------------------------------

[~vitalii], Based on the discussions on the recent Parquet sync meeting the 
community is not against allowing to create empty parquet files. Meanwhile, we 
do not have the bandwidth to invest on this feature. 
Feel free to contribute and I am happy to help/review.

> Allow empty row in parquet file
> -------------------------------
>
>                 Key: PARQUET-2026
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2026
>             Project: Parquet
>          Issue Type: Task
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Vitalii Diravka
>            Priority: Major
>              Labels: Drill, empty-file
>             Fix For: 1.13.0
>
>         Attachments: Screenshot from 2021-04-13 08-52-56.png
>
>
> PARQUET-1851 starts abandon to write parquet files with schema (meta 
> information), but with 0 rows, aka empty files.
> In result it prevent to store empty tables in DRILL by using parquet files, 
> for example:
> {code:java}
> CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0{code}
> {code:java}
> CREATE TABLE dfs.tmp.%s AS select * from 
> dfs.`parquet/alltypes_required.parquet` where `col_int` = 0{code}
> {code:java}
> create table dfs.tmp.%s as select * from 
> dfs.`parquet/empty/complex/empty_complex.parquet`{code}
> So PARQUET-1851 breaks the following test cases:
> {code:java}
> TestUntypedNull.testParquetTableCreation   
> TestParquetWriterEmptyFiles.testComplexEmptyFileSchema   
> TestParquetWriterEmptyFiles.testWriteEmptyFile   
> TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema   
> TestParquetWriterEmptyFiles.testWriteEmptySchemaChange 
> TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable  
> TestMetastoreCommands.testSelectEmptyRequiredParquetTable{code}
>  I suggest to use warning in the process of creating empty parquet files or 
> create alternative _endBlock_ for backward compatibility with other tools:
> !Screenshot from 2021-04-13 08-52-56.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to