[ 
https://issues.apache.org/jira/browse/PARQUET-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Wu updated PARQUET-2026:
-----------------------------
    Fix Version/s: 1.14.0
                       (was: 1.13.0)

> Allow empty row in parquet file
> -------------------------------
>
>                 Key: PARQUET-2026
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2026
>             Project: Parquet
>          Issue Type: Task
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Vitalii Diravka
>            Priority: Major
>              Labels: Drill, empty-file
>             Fix For: 1.14.0
>
>         Attachments: Screenshot from 2021-04-13 08-52-56.png
>
>
> PARQUET-1851 starts abandon to write parquet files with schema (meta 
> information), but with 0 rows, aka empty files.
> In result it prevent to store empty tables in DRILL by using parquet files, 
> for example:
> {code:java}
> CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0{code}
> {code:java}
> CREATE TABLE dfs.tmp.%s AS select * from 
> dfs.`parquet/alltypes_required.parquet` where `col_int` = 0{code}
> {code:java}
> create table dfs.tmp.%s as select * from 
> dfs.`parquet/empty/complex/empty_complex.parquet`{code}
> So PARQUET-1851 breaks the following test cases:
> {code:java}
> TestUntypedNull.testParquetTableCreation   
> TestParquetWriterEmptyFiles.testComplexEmptyFileSchema   
> TestParquetWriterEmptyFiles.testWriteEmptyFile   
> TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema   
> TestParquetWriterEmptyFiles.testWriteEmptySchemaChange 
> TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable  
> TestMetastoreCommands.testSelectEmptyRequiredParquetTable{code}
>  I suggest to use warning in the process of creating empty parquet files or 
> create alternative _endBlock_ for backward compatibility with other tools:
> !Screenshot from 2021-04-13 08-52-56.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to